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Preface 


Advanced Micro Devices is recognized as the pioneer and leader in microprogrammable “bit slice” 
integrated circuits. The Am29300 family sets the current standard in general purpose 32-bit building 
blocks. Designed for high performance and flexibility with a choice of elegant, easy to implement 
architectures, this chip set brings microprogrammable products into the next generation. 


The Am29300 generation gives the system designer flexibility both in hardware architecture and at 
the microprogram level. This 32-bit product family achieves high performance and high integration, 
while avoiding architectural restrictions. The products are designed to meet the high computational 
requirements of advanced graphics systems, image processing, high-end controllers, fault-tolerant 
processors, work stations, and other 32-bit applications limited not by process technology, but only 
by the designer’s imagination. 


Chapters 2, 3, and 4 of this databook describe the current full range of the Am29300 product offerings 
in bipolar and CMOS technologies. Three different types of data sheets are presented: Advanced 
Information, Preliminary, and Final. 


e Advanced Information data sheets are developed from simulation data after 
circuit design is completed. After a process change, advanced information is 


again provided for speed select data. 


¢ Preliminary data sheets are based on actual measurements when silicon is | 
available and units have been tested for AC characteristics. The preliminary test 


programs are in place, but the normal fabrication process variations have not 
allowed setting of final AC limits. | 


e Final data-sheet status is applied to products that are fully characterized over 
the operating range and are in volume production. 


Over 75 application notes and technical articles have been written in 11 different languages 
describing the features and benefits of the Am29300/29C300 family. A few representative articles 
are reprinted in Chapter 6 to serve as a starting point for readers less familiar with the broad scope 
of this chip set. A full list of articles is offered in the bibliography of Chapter 6. 


Technical information regarding product and process reliability, as well as the Advanced Micro 
Devices model for reliability studies is provided in Chapter 7. This chapter also outlines the basic 
thermal characteristic data for the bipolar Am29300 products and describes test philosophy and 
methods. 


Chapter 8 gives general information regarding package outlines and ordering information. 
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1.1 Am29300/29C300 GENERAL OVERVIEW 


CMOS and Bipolar 32-Bit High Performance 
Building Blocks 


AMD’s Am29300/29C300 family has been developed to 
provide systems designers with flexible, off-the-shelf, 
high-performance, 32-bit microprogrammable building 
blocks. The Am29300/29C300 family is ideal for com- 
plex and calculation-intensive applications such as intel- 
ligent peripheral controllers including graphics, telecom- 
munications, switching systems and laser printers; artifi- 
cial intelligence and RISC CPUs; array and digital signal 
processing; and a multitude of military applications. 


Am29300/29C300 Pushes the Limits of 
Your Imagination 


Flexibility of Design 


Successis driven by innovation and differentiation. While 
“me too” systems companies merely struggle to be the 
lowest cost manufacturers, innovative companies strive 
ahead toward the future. The designers of AMD's 32-bit 
family recognize the need for system innovation and 
differentiation. The Am29300/29C300 family provides 
powerful building blocks with unlimited architectural flexi- 
bility, thus returning design innovation and value-added 
back to the design engineer. With the flexibility of custom 
architectures and custom microcode, system perform- 
ance is limited only by imagination. 


Improve Your Time to Market — 


Because AMD’s 32-bit family integrates high perform- 
ance features such as master/slave, parity checking, 


funnel shifters, priority encoders, and mask generators, 
the Am29300/29C300 family meets the complex func- 
tional requirements of sophisticated systems and can 
eliminate the need for custom ICs. With the Am29300/ 
29C300 there are no engineering circuit turnaround 
delays, no hidden Non-Recurring-Engineering costs, no 
complex test engineering correlations, and no waiting. 
Off-the-shelf availability of a highly integrated, fully 
tested product of guaranteed quality can meanimproved 
profits for the system application. 


Specifications that Count 


We provide you with the tools and data necessary to 
make your design right the first time. You canbe assured 
that the specifications of the parts you order are guaran- 
teed by AMD as printed in the data sheets. Designers 
require worst case guaranteed parameter values, and 
AMD provides them. AMD removes the uncertainty of 
customized design with fully guaranteed, standard, off- 
the-shelf, 32-bit products. These state-of-the-art bipolar 
and CMOS building blocks are the ideal solution for 32- 
bit applications. 


Military Product Position 


AMD is committed to support the industry with military 
qualified and specified Am29C300 family products. The 
entire family is being offered as 883C level B fully 
compliant APL products. In addition, we plan to release 
the family in DESC military drawings. This will provide the 
user with alternatives to source control drawings, thus 
saving cost and time. 
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Manufacturing — Processes and Planning 


AMD’s Commitment to Process Technology 
improvements 


The Am2901 industry standard bit-slice ALU is an ideal 
example of AMD’s commitment to process improve- 
ments. Table 1-1 and Figure 1-1 demonstrate the per- 


formance improvements of the Am2901. Since its intro- 
duction, the Am2901’s performance has increased 
nearly three-fold while its price has dropped by a factor of 
ten. This represents 25 percent annual price/perform- 
ance improvement over 12 years. The philosophy of 
performance improvements through process technolo- . 
gies applies to all members of AMD’s microprogram- 
mable products. 


Table 1-1 
Speed 
Year Device Technology Die Size A,B > G,P Power 
1975 Am2901 Low-Power Schottky 33 K mil? 80 ns 1.5W 
1977 Am2901A Dual Layer Metal 20 K mil? 65 ns 1.5 W 
7 lon Implantation 
1978 Am2901B Projection Printing 15 K mil? 50 ns 1.5 W 
1981 Am2901C ECL Internal 15 K mil? 37 ns 1.5 W 
TTL, 0 IMOX 
1986 Am29C01 1.6 um CMOS 15 K mil? 37 ns 0.5W 
1987 Am29C01-1. 1.2 um CMOS 15 K mil 28 ns 0.5 W 
Speed Select 
1987 Am29C01-2) 1.0 um CMOS 15 K mil? 19ns 0.5W 
(est) 
130 
120 
100 @ 110 CMOS 
= 90 ~ 100 
= 80 ul 90 
BIPOLAR 
Ww 70 a Bhat *, 
Zz / = 70 BIPOLA ae 
ze O 
2 cr 
cr WW 
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Figure 1-1. Am2901 Performance 
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Figure 1-2. Am29300/29C300 Performance 
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Bipolar VLSI 


The Am29300 family contains some of the largest bipo- 
lar ICs produced anywhere inthe world. For example, the 
Am29332 has over 5,000 gates, 31,000 devices, and 
measures 142,000 mils?. AMD’s IMOX S-2 process 
allows for such integration and high performance. Future 
advances in AMD’s bipolar process will include process 
“tweaks” as well as total changes in process approach. 
These advances will provide improved performance and 
yields, directly affecting the price/performance of the 
Am29300 family. 


CMOS VLSI 


The Am29C300 family, like its bipolar counterpart, also 
contains very large die. The Am29C325 encompasses 
nearly 11,000 gates and measures almost 130,000 mils?. 


AMD’s CS-11 is the current CMOS workhorse process 
forthe Am29C300 family. At an effective channel width of 
1.6 microns, CS-11 is capable of approaching the bipolar 
speeds on all specifications. 


There will be continued process improvements to the 
current CMOS technology. The first improvement, 
CS-11A, will be available on all Am29C300 products in 
Q4 1987. CS—11A has an effective channel width of 1.2 
microns, resulting in a 25 percent performance improve- 
ment over CS-11. 


Table 1-2 demonstrates the performance improvements 
expected onthe Am29C300 family as these processes 
are incorporated into the family. 


Table 1-2 CMOS Evolution 





Effective Typical 
Year Process Channel Length Gate Delay 
1986 CS-11 1.6 micron | 1.25 ns 
1987 CS-11A 1.2 micron 0.90 ns 
1988 CS-21 1.0 micron 0.65 ns 


The Philosophy Behind the Functionality 


When AMD introduced the 4-bit slice (memory plus ALU) 
Am2901 in 1975, semiconductor and packaging tech- 
nologies prevented the integration of a 16- or 32-bit unit. 
The 4-bit slice with internal memory and external carry- 


look-ahead and a 48-pin package were the right compro- 
mise then. Today, semiconductor and packaging tech- 
nologies have advanced to a point where a full 32-bit ALU 
with many non-sliceable features, internal carry-look- 
ahead, and systems access to all buses can be put on 
one chip, with expandable memory on another. This 
results in higher versatility and higher performance. 


There are several reasons for the choice of a wider data 
path. First, cycle time is improved significantly if carry 
lookahead is contained entirely on the chip. Second, 
Certain powerful on-chip functions, such as the funnel 
shifter, priority encoder, and mask generator are ex- 
tremely difficult to “slice.” Third, a higher level of integra- 
tion leads to a more cost-effective system solution. 
These and other advantages contributed to the decision 
to make the Am29332/29C332 a complete 32-bit func- 
tion rather than a Slice. 


The Am29300/29C300 philosophy has also removed 
the register file from the ALU, providing the designer 
greater system flexibility and making expansion and 
regular addressing much easier. The new partitioning 
results in a number of benefits. The user gets a func- 
tionally more powerful processor with two uncommitted 
input buses and gains the flexibility of adding storage 
elements to those buses. The Am29300/29C300 family 
is designed to be the most functional and powerful family 
of microprogrammable building block products available 
on the market. 


1.2 Am29300/29C300 FAMILY DEVICE 
OVERVIEW 


The Am29332/29C332 32-Bit ALU — The 
Heart of a New Generation of Machines 


The Am29332/29C332 is AMD’s first 32 bit wide ALU. 
Parallel processing of 32 bits of data, coupled with very 
fast cycle time, provides throughput unprecedented in 
VLSI-based systems. 


The 32-bit ALU combines maximum performance and 
integration by keeping all critical timing paths short and 
balanced. All ALU instructions have the same short cycle 
time. This includes barrel shifting, normalization, priority 
encoding and field logical operations. 
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Figure 1-3. Am29332/29C332 32-Bit ALU 


Three Ports Facilitate High Throughput 


The Am29332/29C332 has two input ports (A and B) and 
an output port (Y), all 32 bits wide. These three ports 
provide flexibility and accessibility for high-performance 
processor designs. Dedicated input and output ports 
provide a flow-through architecture and avoid the penalty 
associated with switching a bidirectional bus halfway 
through the cycle. In addition, the three-bus architecture 
allows easy parallel connection of other arithmetic units 
for even higher performance. 


Arithmetic and Logic Unit 


The 32-bit wide ALU in the Am29332/29C332 has full 
carry-lookahead to improve cycle time for all arithmetic 
operations. The ALU is a unique three-input structure 
with two data input ports and a mask input that is used on 
every cycle, thus providing very powerful instructions 


that execute in a single cycle. The mask supports byte- 
aligned arithmetic operations and field logical operations 
on variable-position, variable-length fields. The byte- 
aligned arithmetic operations use 8-, 16-, 24-, and 32-bit 
LSB-aligned operands. Field-logical instructions operate 
on operands of arbitrary length and starting position. 


Priority Encoder | 


The priority encoder generates a 5-bit vector indicating 
the highest order ‘one’ inthe 32-bit operand. These 5 bits 
are then stored in the position field of the status register 
for use during the next cycle. The priority encoder sup- 
ports all byte-aligned data types; the result is dependent 
upon the byte width. specified. This function supports 
normalization necessary for floating point operations; it 
also enhances certain graphics primitives. 
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64-Bit Funnel Shifter 


The on-board 64-bit input, 32-bit output funnel shifter is 
much more than a conventional barrel shifter. The shifter 
can extract any contiguous field of 32 bits from a 64-bit 
input. This input may consist of concatenated A and B 
input words or, for barrel shifting, duplicated A or B input 
words. 


Residing in the ALU data path, the shifter can perform n- 
bit shift or rotate in conjunction with a logical ALU 
operation—all in the same cycle, without increasing the 
length of the cycle. This capability affords single-cycle 
execution of logical operations beween unaligned fields 
— a function that would take multiple cycles in other 
architectures. 


Mask Generator 


The power and flexibility of the processor stems partly 
from its ability to generate a mask to control the width of 
an operation for each instruction without any cycle time 
penalty. The mask generator at the ALU input creates a 
contiguous field of ones and contains its own shifter to 
position this control field anywhere in the data path. The 
mask generator can also be used as a pattern generator, 
bypassing the mask through the ALU. 


Status Register 


The processor has a 32-bit wide status register that 
contains: information on position and width of the oper- 
and; the ALU status flags Carry, Negative, Overflow, and 
Zero; status bits for evaluation of inequalities; a link bit for 
multiprecision shifts; an M flag for high speed multiply 
and divide; and intermediate nibble carries for BCD 
arithmetic. An extract-status instruction is provided that 
allows any bit from the status register to be output at the 
Y-port. This is particularly useful in machines employing 
stack architectures. Instructions to save and restore the 
status register are also provided. 


Multiply and Divide Support 


The chip incorporates dedicated hardware to allow effi- 
cient implementation of multiply and divide algorithms for 


both unsigned and signed arithmetic data types. The 
modified Booth multiply algorithm processes two bits 
per cycle. The four-quadrant, non-restoring divide algo- 
rithm processes one bit per cycle. Since the data path 
width is fixed at 32 bits, the instructions can be simplified 
to provide “first step,” “iterate step” and “last step” com- 
mands for both multiply and divide. Programming slices 
is no longer necessary since all multiply and divide steps 
are provided in the instruction set. For business-oriented 
machines, the ALU is capable of performing BCD arith- 
metic on packed BCD numbers. In order to keep non- 
BCD operations fast, BCD arithmetic is executed by 
binary arithmetic followed by BCD correction. 


The Instruction Set: Powerful and Flexible 
Yet Simple and Regular 


The Am29332/29C332 instruction set complements the 
powerful hardware. To ease the task of code generation, 
the instruction set is symmetrical and regular. There are 
two large classes of instructions. The first class handles 
byte-aligned data (8-, 16-, 24-, or 32-bit LSB-aligned). It 
is comprised of: data movement instructions; arithmetic 
instructions, including multiply and divide steps and BCD 
instructions; logical instructions; and single-bit shift and 
prioritize operations. The second class of instructions 
operates on variable-length, variable-position fields. It 
includes N-bit shift and rotate, field extract, and field 
logical operations. 


The Am29331/29C331 — 16-Bit 
Micro-Interruptible Sequencer 


The Am29331/29C331 is a high speed sequencer con- 
trolling the sequence of microinstructions stored in mi- 
croprogram memory. The instruction set aids structured 
microprogramming and handles sequential execution, 
branches, subroutines and loops. The sequencer in- 
structions may be unconditional or conditional based on 
CPU status, an on-board 8-input test multiplexer, and a 
polarity control. The sequencer has a 16-bit wide address 
path and can thus access 64K words of microcode 
memory. It is transparently interruptible at any microin- 
struction boundary. 
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Figure 1-4. Am29331/29C331 16-Bit Microinterruptible Sequencer 


Balanced Timing Means Greater Throughput 


In previous generation microprogrammed systems, the 
control path containing the sequencer has often beenthe 
bottleneck, because the sequencers were slower than 
the associated data paths. Not so in the Am29300/ 
290300 family. The speed of the Am29331/29C331 
sequencer has been designed such that the entire sys- 
temtiming is balanced between the control path and data 
path, leading to higher overall throughput. 


Micro-Level Interruptible 


Real time interrupt handling at the microinstruction level 
is made possible by the interrupt return address register 
and the bidirectional Y-port. While the interrupt address 
enters the part through the Y-port, the interrupt return 
address is saved on the stack. Nested interrupts are 
handled the same way. 


Built-in Trap Handling 


As an architectural alternative to the interrupt-driven 
approach, the Am29331/29C331 Sequencer also has 
provision for handling “traps” transparently at the micro- 
instruction level, upon the occurrence of specified sys- 
tem events. In this mode, the current microinstruction is 
aborted. The specified trap routine is executed (like an 
interrupt). But, following the trap routine, the aborted 
microinstruction is re-executed (instead of proceeding on 
to the next microinstruction, as in an interrupt). 


33-Level Stack 


The 33-level stack provides sufficient depth to handle 
nested loops and subroutines; it is also used to save the 
status of the sequencer when handling interrupts. Since 
the stack is externally accessible, its contents may be 
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unloaded through the bidirectional D-port for diagnostic, 
debugging or fault recovery purposes. The stack may 
also be loaded from the outside through the D-port. This 
may be used for context switching, for example. 


Multitasking Support 


By providing a HOLD control pin, the designer may use 
multiple sequencers in a multitasking system, with only 
one sequencer active at any one time. The output Y-ports 
of the sequencers are tied together to address the same 
microcode memory. This is useful, for example, for rapid 
context switching at the microinstruction level. 


Address Comparator Eases Debugging 


The sequencer compares the address on the Y-port with 
the contents of an internal break-point register. Break- 
point detection is useful for debugging the system or 
gathering run-time statistics. 


Two-Branch Address Inputs 


Two separate branch address inputs, D and A, are 
provided to speed up source address selection. Both A 
and D ports can be used to load the counter. The D port 
can also be used to load or unload the stack while the A 
port may be used to input a branch or map address, 
eliminating the need to three-state selected sources. 


Built-in Test Generation Logic 


In the Am29331/29C331, unlike previous sequencers, 
test generation logic and one layer of condition test 
multiplexer logic are built-in. This not only reduces 
component count, but also improves cycle time by mini- 
mizing inter-chip delays and by moving the multiplexer 
into fast internal ECL gates. 


Multiway 


Four sets of four-bit multiway inputs are provided. Each 
such set of 4 bits can replace the four least significant bits 
of D input, allowing a direct branch to any of 16 consecu- 
tive locations in the microprogram memory. The multi- 
way Capability allows checking of up to four simultaneous 
test conditions in a single cycle. This is obviously an 
attractive alternative to checking each test condition 
serially, a much slower multicycle process. 


The Most Versatile Sequencer Ever 


The combination of 16 bits of address, real time interrupt 
Capability, two address ports, a deep stack and other 


Capabilities make this device the most feature-loaded 
sequencer ever offered. 


The Am29334/29C334 Register File 


The Am29334/29C334 is a 64 word by 18 bit, dual- 
access, four-port register file. It is deliberately separate 
from the ALU to allow easy, regular expansion, both 
horizontally for wide data paths and vertically for large 
register file machines. 


Four-Port Achitecture 


Two Read and two Write data ports allow independent 
and simultaneous access to two register file locations. 
The Read and Write ports are separated to eliminate the 
delay caused by turn-around of bidirectional buses. The 
dual-address, four-port architecture allows any combina- 
tion of two reads, writes, or read-writes — no restrictions. 


Organization Supports Parity 


Since the Am29334/29C334 has a by-18 organization, it 
can store two bytes with parity in each of its 64 words. As 
a data path storage element, the register file neither 
generates nor checks parity. When used in conjunction 
with the Am29332/29C332 processor (which provides 
parity checking on its inputs and parity generation on its 
output), it provides a bus compatible register file, thus 
extending parity protection to the entire data path loop. 


Array Processing Products/Arithmetic 
Accelerators 


The Am29300/29C300 family is capable of very fast 
operation on 32-bit fixed-point numbers. When greater 
dynamic range is necessary, floating-point numbers 
are often chosen. Advanced Micro Devices offers high- 
speed VLSI integrated circuits designed to support the 
growing need for high-performance array and signal 
processing. Applications include graphics, image 
processing, communications, medical instrumentation, 
radar and other electronic warfare applications. Three 
AMD devices address these needs: Am29325/29C325 
32-bit Floating-Point Processor, Am29C323 32x32-bit 
Multiprecision Multiplier, and Am29C327 64-bit Float- 
ing-Point Processor. These devices achieve very high 
speeds through a combination of innovative architec- 
ture and AMD’s advanced bipolar IMOX process and 
CMOS process. 
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Am29325/29C325 


The Am29325/29C325 is a high-speed, single precision 
floating-point processor. It performs 32-bit floating-point 
addition, subtraction and multiplication operations in a 
single device, using either IEEE-P754, draft 10.0 or 
DEC VAX format. 


Single-Cycle Execution 


Since performance is the objective, all 
instructions—including multiply—require only one cycle to 
execute. 


No Mandatory Pipelining 


Although the Am29325/29C325 FPP has input and out- 
put registers to make it a general purpose accelerator, 
there are no pipeline registers internal to the floating point 
array. Even the I/O registers can be made transparent. 


Three-Bus Architecture 


The Am29325/29C325, like the Am29332/29C332, has 
a three-bus architecture, with two input buses and one 
output bus, thereby providing a bus compatible accelera- 
tor. This configuration provides high I/O bandwidth allow- 
ing the user to take full advantage of the single cycle, 
high-speed, floating-point ALU. Naturally, the input and 
output registers may be made transparent with individual 
clock enables. In addition, the input and output registers 
may be made transparent with independent feed- 


CLK-1 


Select and 
Enable Lines 


C7 


Floating-Point ALU 
Port F 





through controls. The rules remain consistent — the 
system architecture achieves the highest performance 
when the component architectures do not interfere. 


Powerful Instruction Set 


The Am29325/29C325 executes the following instruc- 
tions: 


° Add (R plus S) 

¢ Subtract (R minus S) 
° Multiply (R times S) 
* Constant Subtract (2 minus S) 


° Integer to Floating Point Conversion 
¢ Floating Point to Integer Conversion 
e IEEE to DEC Format Conversion 
* DEC to IEEE Format Conversion 


The instruction (2 minus S) is provided to support the 
Newton-Raphson division algorithm. 


Internal Data Paths Support Accumulation 


The Am29325/29C325 has two internal feedback paths 
to facilitate two-cycle internal multiply-accumulate op- 
eration. The F1 bus can store the results of the multiply 
operation in an input register for subsequent accumula- 
tion. The F2 bus lets the output register function as an 
accumulator by making its output available as an oper- 
and for the next cycle. 


S 0.31 


Status Flag 
Cen. 

Status Flag 
Reg. 
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Figure 1-7. Am29325/29C325 32-Bit Floating Point Processor 
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Am29C325 Stand-Alone Performance ’ 


The Am29C325 is a stand-alone CMOS Floating Point 
Processor. When used with a simple sequencer such as 
the Am29C10A, itcanbe usedas a lowcost floating-point 
engine for applications requiring iterative algorithms 
such as Chebyshev and Newton-Raphson. These algo- 
rithms are used extensively in guidance, image and 
signal processing, and other DSP applications. 


Programmable I/O Structure 


To provide compatability with different system buses, 
controls are provided for the following options: 


¢ Two 32-bit input buses and one 32-bit output bus 
* One 32-bit input bus and one 32-bit output bus 
* Two 16-bit input buses and one 16-bit output bus 


The input modes affect only the manner in which 
operands are entered into the device. The operation 
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of the floating-point ALU is not altered. For example, 
in the 32-bit/one input-bus mode, the two 32-bit inputs 
are tied together and the two input operands are 
clocked into the input registers on alternate rising and 
falling edges of the clock. In the 16-bit, 3-bus mode, the 
32-bit operands are deliveredon two consecutive clock 
cycles in 16-bit increments. 


Am29C327 Double-Precision 
Floating-Point Processor 


The Am29C327 double-precision floating-point proces- 
sor is a high performance, single VLSI device that imple- 
ments an extensive floating-point and integer instruction 
set. It can perform operations on single-, double-, or 
mixed-precision operands. The three most popular float- 
ing-point formats —- IEEE, DEC, and IBM — are supported. 
IEEE operations comply with the standard P754, with 
direct implementation of specialfeatures such as gradual 
underflow and trap handling. | 
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Figure 1-8. Microcoded Floating Point Co-Processor 
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Flow-Through or Pipelined 


Operations can be performed in either of two modes: 
flow-through or pipelined. In the flow-through mode, the 
ALU is completely combinatorial; this mode is best suited 
for scalar operations. Pipelined mode divides the ALU 
into one or two pipelined stages for use in vector opera- 
tions, as is often found in graphics or signal processing. 


Three-Bus Architecture 


The Am29C327 has two input buses and one output bus 
— a three-bus architecture just like the Am29C325 float- 
ing-point processor. It provides flexibility and ease of 
interface, making it a very high performance accelerator. 


Input/Output Modes 


The Am29C327 supports eight I/O modes which provide 
a flexible interface to a variety of 32-bit and 64-bit 
systems. The input buses can be configured as separate 
32-bit input buses or as a single 64-bit input bus. It is 
possible to load two 64-bit operands in a single clock 
cycle. The input modes are: 


32-bit, double-cycle, LSWs first 
32-bit, double-cycle, MSWs first 
32-bit, single-cycle, LSWs first 
32-bit, single-cycle, MSWs first 
64-bit, double-cycle, R first 
64-bit, double-cycle, S first 
64-bit, single-cycle, R first 
64-bit, single-cycle, S first 


Integer or Floating-Point 


In addition to supporting 32-bit and 64-bit integer opera- 
tions, the Am29C327 supports the following floating- 
point formats in single- or double-precision: 


IEEE P754 version 10.1 
DEC F, DEC D, and DEC G formats 
IBM system 370 format. 


Conversion between the floating-point formats and con- 
version between floating-point and integer formats are 
also provided. This is a very powerful feature not avail- 
able in any other architecture. 


Mixed-Precision Operations 


All Am29C327 instructions, floating-point or integer, 
canbe performed in either single- ordouble-precision op- 
erands. In addition, the user can elect to mix precisions 
within an operation. All operations are internally per- 
formed in double precision; the user specifies the de- 
sired precision of the input and output operands. The 


necessary precision conversions are made in concert 
with the selected operation, with no additional cycle-time 
overhead. 


Register File and Internal Datapath Support 
Compound Operations 


The ALU of the Am29C327 has three data input ports and 
can perform operations of the form (A*B)+C. An eight- 
deep register file for storing immediate results used in 
recursive operations, and the on-chip 64-bit datapath, 
facilitates Compound operations such as Newton-Ra- 
phson division, sum-of-products, and transcendentals. 


Comprehensive Floating-Point and Integer 
Instruction Sets 


The Am29C327 implements an extensive number of 
arithmetic and logical instructions. These instructions fall 
into the following categories: 


addition/subtraction 
multiplication 

multiplication/ accumulation 
comparison 

max/min 

saturation (clipping) 
rounding to integral value 
absolute value, negation 
reciprocal seed generation 


floating-point <— — floating-point conversion 
floating-point<— — integer conversion 
integer<— — integer conversion 

pass operand 

logical operations; e.g. AND, OR, XOR, NOT 
move data | 


By concatenating these operations, the user can also 
perform division, square-root extraction, polynomial 
evaluation, and other functions not implemented directly. 


Am29C323 Multiplier 


The Am29C323 is a high-speed parallel 32x32-bit multi- 
plier designed to speed up systems using fixed or float- 
ing-point notation. 


Three-Bus Architecture 


Just like other members of the family, the Am29C323 has 
two input buses and one output bus. This configuration 
provides high I/O bandwidth, allowing the user to take full 
advantage of the high-speed parallel multiplier core of 
the device. 
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Figure 1-10. Am29C323 32x32 Parallel Multiplier 


Multiprecision Multiplication Made Easy 


By including 32-bit shift and accumulate to generate 
partial products, the internal architecture of the 
Am29C323 supports fast multiprecision multiplication. 
Both input ports have dual 32-bit registers, and the output 
port can select from a 67-bit product register, a 32-bit 
temporary register, or directly from the 32x32-bit multi- 
plier array. A complete 32x32-bit clocked multiplication 
takes a single cycle (naturally — and with no pipelining!). 
Multiprecision multiplication uses the shift and accumu- 
late logic to collect partial products starting with the least 
significant product. The number of cycles depends upon 
the input data width, with three-cycle latency, as shown 
in the table below. By using the I/O registers for pipelin- 


ing, much greater throughput can be achieved. For 
example, by overlapping 64x64-bit operations, a full 128- 
bit product is available every four cycles. Multiplying the 


mantissas of two double-precision 64-bit floating-point — 


numbers, for example, is one possible application of this 
high speed multiprecision multiplication capacity. 


Number of Cycles 


Single Overlapped 
Operands Product Operations 
32x32 1 1 
64x64 7 4 
96x96 12 9 
128x128 19 16 
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Registered Buses 


Ail buses in the device are registered, and each register 
has its own Clock Enable. The device operates from a 
single clock, ideal for microprogrammed systems. All 
ports — input, output, and instruction — can be made 
transparent independently. 


Complete Interlocking Fault Detection 


To enhance system reliability by ensuring data integrity 
and correct hardware operation, the family supports both 
master/slave fault detection and data path parity. The 
system features byte parity checking on the inputs and 
byte parity generation on the outputs of the Am29332/ 
290332 ALU and Am29C323 32x32-bit multiplier. Also, 
the organization of the Am29334/29C334 64x18 register 
file accommodates parity bits for each byte. The parity 
mechanism assures data path integrity. Major functional 
blocks-Am29332/29C332 ALU, Am29331/29C331 
sequencer, Am29C323 32x32 bit multiplier, and 





Y 


PARITY 
GENERATE | 


36 


Am29C327 64-bit floating-point processor—have “mas- 
ter/slave fault detection” to ensure correct operation 
without having to carry parity through complex internal 
logic (shifters, mask generators, etc.) and without having 
to pay the resulting delay penalties. In master/slave 
mode, two functional units are connected in parallel 
with one unit doing the actual operation and the other — 
checking the result, on a cycle-by-cycle, bit-by-bit basis. 
The master is used forthe normal data path. Inthe slave, 
however, all outputs become inputs, and the slave com- 
pares the outputs of the master with its own internally 
generated result. If the two don’t match, an error signal 
is generated, triggering an interrupt at the microin- 
struction level. No specialized software is required for 
the master/slave scheme. Also, the designer can choose 
to impose redundancy at the component or board level. 
The parity mechanism and the master/slave concept, 
which use cost-effective hardware rather than expensive 
software, provide a comprehensive solution for fault 
tolerant systems. 
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Figure 1-11. Input Parity Checking / Output Parity Checking 
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Figure 1-12. Master/Slave Error Checking 
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Am29337 16-Bit Bounds Checker 


The need for simple yet sophisticated functionality and 
board space savings created the Am29337, a 16-bit 
bounds checker. This product provides inexpensive, 
easy-to-use solutions fro the following applications: 

° intelligent address decoder 

¢ window clipping in graphics 

¢ filterin DSP 

* memory protection systems 

¢ RISC processors 

* multi/parallel processors 

* logic analyzers 

* tag/data buffers 


The Am29337 compares incoming 16-bit data against 
both lower and upper bounds and reports whether the 


SIGNED 


Cl y [> 


ENy Co 


DoDis | 


CP LS 


EN, Co 


cl, C> 


sa anata a a a a a a eae eee eee eee 


data is inside or outside the bounds. It can be cascaded 
for 32-bit data and longer without sacrificing speed. 


The Am29337 is housed in a 400 mil ceramic 28-pin DIP 
for board space savings. 


User Benefits 


¢ Replaces MSI devices, saves board space 


¢ Low-cost solution compared to conventional! alter- 
natives 


Distinctive Features 


* Double Comparators compare a 16-bit input num- 
ber against a lower and an upper limit 


* 16-bit operation, cascadable to longer words 
¢ Compares signed or unsigned numbers 
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Figure 1-13. Am29337 Block Diagram 
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Am29338 32-Bit Byte Queue 


The Am29338 is a general purpose 32-bit intelligent 
FIFO that allows up to four bytes to be queued or de- 
queued in a single cycle. 


Fabricated with AMD’s IMOX-S2 technology and housed 
ina 120-pin PGA, the Am29338 meets the requirements 
fora high-speed FIFO buffer with minimum real estate. 
The part will also be made available in high-speed, low- 
power 1.2 micron CMOS technology. 


Features of the Am29338 include: 


* Queuing of up to 128 bytes 

* Queuing or de-queuing of up to 4 bytes at a time 
° Byte rotation on the inputs and outputs 

¢ Asynchronous/synchronous operations 

¢ Accepts 8-, 16-, 24-, and 32-bit input data 

* Repetitive queuing of block data 


¢ Almost empty/full signal if less than 4 bytes available 
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Figure 1-14. Am29338 Block Diagram 
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Significant User Benefits 


The Am29388 is an excellent choice for a wide variety of 
system design problems. Its benefits include: a shorter — 
design cycle when compared with implementing the 
same functions with traditional FIFOs, higher perform- 
ance, off-the-shelf functionality, less board space, and 
less power than the separate parts needed to combine 
this logic. 


Applications 


¢ Hardware mailbox between two heterogeneous 
processors 


e1/O bus buffers between a processor and 
controller | 


¢ Instruction prefetch queue for byte addressable 
microprocessor systems 


° Write buffer between CPU and main memory 


® Bus conversions, 8-, 16-, 24-, and 32-bits. 
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1.3 A.C. AND D.C. PARAMETER DEFINITIONS 


Definition of A.C. Switching Terms 


f 


MAX 


toy 


teu 


The highest operating clock frequency. 

The propagation delay time from an input change to an output LOW-to-HIGH transition. 

The propagation delay time from an input change to an output HIGH-to-LOW transition. 
Pulse width. The time between the leading and trailing edges of a pulse. 

Rise time. The time required for a signal to change from 10% to 90% of its measured values. 
Fall time. The time required for a signal to change from 90% to 10% of its measured values. 


Set-up time. The time interval for which a signal must be applied and maintained at one input terminal 
before an active transition occurs at another terminal. 


Holdtime. The time interval for which a signal must be retained at one input after an active transition occurs 
at another input terminal. | 


HIGH to disable. The delay time from a control input change to the output transition from the HIGH-level 
to high-impedance (measured at 0.5V change). 


LOW to disable. The delay time from a control input change to the output transition from the LOW-level 
to high-impedance transition (measured at 0.5 V change). 


Enable HIGH. The delay time from a control input change to the output transition from high-impedance 
to HIGH-level. 


Enable LOW. The delay time from a control input change to the output transition from high-impedance 
to LOW-level. 


Definition of D.C. Terms 


O 


Negative 
Current 


Positive 
Current 


OZH 


OZL 


Power dissipation capacitance used to determine the no-load dynamic current consumption. 
HIGH, applying to a HIGH voltage level. 

LOW, applying to a LOW voltage level. 

Input 

Output 


Current flowing out of the device. 
Current flowing into the device. 


LOW-level input current with a specified LOW-level voltage applied. 
HIGH-level input current with a specified HIGH-level voltage applied. 
LOW-level output current. 

HIGH-level output current. 

Output short-circuit source current. 

Supply current drawn by the device from the V,,, power supply. 
Three-state off-state output current, HIGH- level voltage applied. 
Three-state off-state output current, LOW- level voltage applied. 
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Vee The range of supply voltage over which the device is guaranteed to operate. 

Vie The highest input voltage that is guaranteed to be recognized by the device as a logic LOW. 

Vig The lowest input voltage that is guaranteed to be recognized by the device as a logic HIGH. 

Vor The highest logic LOW voltage guaranteed at the output terminal while sinking the specified load current 
lie | . | 

Von The lowest logic HIGH voltage guaranteed at the output terminal when sourcing the specified source 
current |)... . 

lee The supply current drawn by the device from the V_, power supply for an ECL circuit. 

Ver Most negative power supply for an ECL circuit. 
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CMOS Family 

Am29C331 CMOS 16-Bit Microprogram Sequencer 
Am29C332 CMOS 32-Bit Arithmetic Logic Unit 
Am29C334 CMOS Four-Port Dual-Access Register File 
Am29C325 CMOS 32-Bit Floating-Point Processor” 


Am29C327 CMOS Double-Precision Floating-Point Processor* 


* Front page only of data sheet. See Chapter 4 for complete data sheet. 
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Am29C331 


CMOS 16-Bit Microprogram Sequencer 


PRELIMINARY 


DISTINCTIVE CHARACTERISTICS 


16-Bits Address up to 64K Words 

Supports 110-ns microcycle time for a 32-bit high- 
performance system when used with the other 
members of the Am29C300 Family. 

Speed Select 

Supports 80-ns system cycle time. 

Real-Time Interrupt Support 

Micro-trap and interrupts are handled transparently 
at any microinstruction boundary. 

Built-In Conditional Test Logic 

Has twelve external test inputs, four of which are 
used to internally generate an additional four test 
conditions. Test multiplexer selects one out of 16 
test inputs. 


@ Break-Point Logic 
Built-in address comparator allows break-points in 
the microcode for debugging and statistics collection. 
Master/Slave Error Checking 
Two sequencers can operate in parallel as a master 
and a slave. The slave generates a fault flag for 
unequal results. 
33-Level Stack 
Provides support for interrupts, loops, and subrou- 
tine nesting. It can be accessed through the D-bus 
to support diagnostics. 


GENERAL DESCRIPTION 


The Am29C331 is a 16-bit wide, high-speed single-chip 
sequencer designed to control the execution sequence of 
microinstructions stored in the microprogram memory. The 
instruction set is designed to resemble high-level language 
constructs, thereby bringing high-level language program- 
ming to the micro level. 


The Am29C331 is interruptible at any microinstruction 
boundary to support real-time interrupts. Interrupts are 
handled transparently to the microprogrammer as an unex- 
pected procedure call. Traps are also handled transparent- 
ly at any microinstruction boundary. This feature allows re- 
execution of the prior microinstruction. Two separate buses 
are provided to bring a branch address directly into the chip 
from two sources to avoid slow turn-on and turn-off times 
for different sources connected to the data-input bus. Four 


sets of multiway inputs are also provided to avoid slow turn- 
on and turn-off times for different branch-address sources. 
This feature allows implementation of table look-up or use 
of external conditions as part of a branch address. The 
33-deep stack provides the ability to support interrupts, 
loops, and subroutine nesting. The stack can be read 
through the D-bus to support diagnostics or to implement 
multitasking at the micro-architecture level. The master/ 
slave mode provides a complete function check capability 
for the device. 


Fabricated using Advanced Micro Devices’ 1.6 micron 
CMOS process, the Am29C331 is powered by a single 5- 
volt supply. The device is housed in a 120-terminal pin-grid 
array package. 
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Figure 1. Am29C331 Detailed Block Diagram 
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*Pins facing up. 





CONNECTION DIAGRAM 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 


formed by a combination of: a. Device Number 


b. Speed Option (if applicable) 


c. Package Type 
d. Temperature Range 
e. Optional Processing 


AM29C331 = iG: C 


DEVICE NUMBER/DESCRIPTION 


Am29C331 
CMOS 16-Bit Microprogram Sequencer 


Valid Combinations 


AM29C331 
AM29C331-1 


GC, GCB 


. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 


. TEMPERATURE RANGE 
C = Commercial (0 to + 70°C) 


. PACKAGE TYPE . 
G = 120-Lead Pin Grid Array without Heatsink 
(CGX120) | 


. SPEED OPTION 
-1= Speed Select 
~2 = Speed Select (TBD) 


Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released valid combinations, 
and to obtain additional data on AMD's standard military 
grade products. | 
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MILITARY ORDERING INFORMATION 
APL Products 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Device Class 

d. Package Type 

e. Lead Finish 


AM29C331 /B Z Cc 


|, LEAD FINISH 


C = Gold 


d. PACKAGE TYPE 
Z = 120-Lead Pin Grid Array without Heatsink 
(CGX120) 


c. DEVICE CLASS 
/B=Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C331 
CMOS 16-Bit Microprogram Sequencer 


Valid Combinations Valid Combinations 
Valid Combinations list configurations planned to be 


supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. _ 


Group A Tests 


Group A tests consist of Subgroups 
1,2). By 7,8 Oy 10, 11. 
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PIN DESCRIPTION 


Ao-Ais5 Alternate Data (Input) 
Input to address multiplexer and counter. 


A-FULL Almost Full (Bidirectional; Three-State) 
Indicates that 28 < SP <63 (meaning there are five or less 
empty locations left on stack). Also active during stack 
underflow. 

Cin . Carry In (Input, Active LOW) 

Carry-in to the incrementer. 

CP Clock Pulse (input) 

Clocks sequencer at the LOW-to-HIGH transition. 

Do-Di5 Data (Bidirectional, Three-State) 

Input to address multiplexer, counter, stack, and comparator 
register. Output for stack and stack pointer. 

EQUAL Equal (Bidirectional, Three-State) 


Indicates that the address comparator is enabled and has 
found a match. 


ERROR _ Error (Output) 
Indicates a master/slave error in the slave mode. Indicates 
a malfunctioning driver or contention of any output in the 
master mode. 


FC Force Continue (input) 


Overrides instruction with CONTINUE. 


HOLD _ Hold (Input) 
Stops the sequencer and three-states the outputs. 


FUNCTIONAL DESCRIPTION 
Architecture 


The major blocks of the sequencer are the address multiplex- 
er, the address register (AR), the stack (with the top of stack 
denoted TOS), the counter (C), the test multiplexer with logic, 
and the address comparison register (R) (Figure 1). The 
bidirectional D-bus provides branch addresses and iteration 
counts; it also allows access to the stack from the outside. 
The A-bus may be used for map addresses. There are four 
sets of four-bit multiway branch inputs (M). The bidirectional Y- 
bus either outputs microprogram addresses or inputs interrupt 
addresses. The buses are all 16 bits wide. Figure 1 shows a 
detailed block diagram of the sequencer. 


Address Multiplexer 


The address multiplexer can select.an address from any of 
five sources: 


1) A branch address supplied by the D-bus 
2) A branch address supplied by the A-bus 


INTEN 


lo-'5 Instruction (Input) 
Selects one of 64 instructions. 


INTA = Interrupt Acknowledge (Bidirectional; Three- 


State, Active LOW) 
Indicates that an interrupt is accepted. 


Interrupt Enable (Input) 
Enables interrupts. 


INTR ‘Interrupt Request (Input) 
. Requests the sequencer to interrupt execution. 


Mo-3, 0-3 Multiway (Input) 
Four sets of multiway inputs providing 16-way branches. 
The first index refers to the set number. 


OEp Output Enable — D-Bus (Input) 
Enables the D-bus driver, provided that the sequencer is not 
in the hold or slave mode. 


RST Reset (input; Active LOW) 
Resets the sequencer. 


So-S3 Select (Input) 
Selects one of 16 test conditions. 


SLAVE Slave (Input) 
Makes the sequencer a slave. 


To-T11 Test (Input) 
Provides external test inputs. 


Yo-Y15 Address (Bidirectional; Three-State) 
Output of microcode address. Input for interrupt address. 


3) A multiway-branch address 

4) A return or loop address from the top of stack 
5) The next sequential address from the incrementer 
Multiway-Branch Address 


A multiway-branch address is formed by substituting the lower 
four bits of the address on the D-bus (D3, D2, D1, Dg) with one 
of the four sets (Mox, Myx, Mex, or M3x) of four-bit multiway- 
branch addresses. The multiway-branch set is selected by the 
number D7Do, while the bits D3 and Do are ''don't cares" (see 
Figure 2). 


o: | Be | Wuliway Set Selected 
Pofol Mm 
Pets |. 
(cc 





Branch 
Address 


Multiway Inputs 


Address 







Base Address 





Table 4 (M3) 
Table 3 (Mox) 


Table 2 (My y) 











Table 1 (Mox) 2 
7 
15 
Lookup Table 
BD007460 


Notes: 1. Dy and Do select one out of four multiway sets. D3 and Do are ''don't cares.” 
2. Each set of M3x-Mox can select one of sixteen locations. The multiway-branch address is the 
concatenation of Dy5-D,4 (base address) and Myx3- Mxo. 
3. For a given base address, there can be four look-up tables, each sixteen deep. 


Figure 2. Multiway Branch 


Address Register and Incrementer 


The address register contains the current address. It is loaded 
from the interrupt multiplexer and feeds the incrementer. The 
incrementer is inhibited if Cy is taken HIGH. 


Stack 


A 33-word-deep and 16 bit-wide stack provides first-in last-out 
storage for return addresses, loop addresses, and counter 
values. Items to be pushed come from the incrementer, the 
interrupt-return-address register, the counter, or the D-bus. 
Items popped go to the address multiplexer, the counter, or 
the D-bus. 


The access to the stack via the D-bus may be used for context 
switching, stack extension, or diagnostics. As the stack is only 
accessible from the top, stack extension is done by temporari- 
ly storing the whole or some lower part of the stack outside the 
sequencer. The save and the later restore are done with pop 
and push operations, respectively, at balanced points in the 
microprogram; for example, points with the same stack depth. 
The internal D-bus driver must be turned on when popping an 
item to the D-bus; if the driver is off, the item will be unstacked 
instead. The driver is normally turned on when the Output 


Enable signal is asserted and the sequencer is not being reset . 


(OEp = 1, RST = 4). 


The stack pointer is a modulo 64 counter, which is increment- 
ed on each push and decremented on each pop. The stack 
pointer is reset to zero when the sequencer is reset, but the 
pointer may also be reset by instruction. Thus, the stack 
pointer indicates the number of items on the stack as long as 


stack overflow or underflow has not occurred. Overflow 


happens when an item is pushed onto a full stack, whereby 
the item at the bottom of the stack is overwritten. Underflow 
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happens when an item is popped from an empty stack; in this 
case the item is undefined. 


In the case of stack overflow, the SP is incremented for every 
push after overflow. Thus, immediately after the first occu- 
rence of stack overflow, the SP will be equal to 34. Subse- 
quent pushes will increment the SP to 35, 36 ... 61, 62, 63, 0, 
1, etc. In the case of stack underflow, the SP is decremented 
for every pop after underflow. Thus, immediately after the first 
occurrence of stack underflow, the SP will be equal to 63. 
Subsequent pops will decrement the SP to 62, 61, ... 2, 1, 0, 
63, etc. 


The contents of the stack pointer are present on the D-bus for 
all instructions except POP D, provided the driver is turned on. 
The output signal, A-FULL, is active under the following 
condition: 28 <SP <693. 


Counter 


The counter may be used as a loop counter. It may be loaded 
from the D-bus, the A-bus, or via a pop from the stack. Its 
contents may also be pushed onto the stack. 


A normal for-loop is set up by a FOR instruction, which loads 
the counter from the D- or A-bus with the desired number of 
iterations; the instruction also pushes onto the stack a loop 
address that points to the next sequential instruction. The end 
of the loop is given by an unconditional END FOR instruction, 
which tests the counter value against the value one and then 
decrements the counter. If the values differ, the loop is 
repeated by selecting the address at the stack as the next 
address. If the values are equal, the loop is terminated by 
popping the stack, thereby removing the loop address, and 
selecting the address from the incrementer as the next 
address. The number of iterations is a 16-bit unsigned number, 
except that the number zero corresponds to 65,536 iterations. 

















By pushing and popping counter values it is possible to handie 
nested loops. 


Address Comparison 


The sequencer is able to compare the address from the 
interrupt multiplexer with the contents of the comparator 
register. The instruction SET loads the comparator register 
with the address on the D-bus and enables the comparison, 
while CLEAR disables it. The comparison is disabled at reset. 
A HIGH is present at the output EQUAL if the comparison is 
enabled and the two addresses are equal. The comparison is 
useful for detection of a break point or counting the number of 
times a microinstruction at a specific address is executed. 


Instruction Set 


The sequencer has 64 instructions that are divided into four 
classes of 16 instructions each. The instruction lines Io —I5 
use I5 and Iq to select a class, and Io-Il3 to select an 
instruction within a class. The classes are: 


I5 4 Classes 

0 0 Conditional sequence control, 

0 1 Conditional sequence control with inverted 
polarity, 

1 0 Unconditional sequence control, and 

Special function with implicit continue. 


—_ 
wh 


Note that for the first three classes I5 forces the condition to 
be true and l4 inverts the condition. The basic instructions of 
the first three classes are shown in Table 1 and the instruc- 
tions of the fourth class in Table 2. 


Structured microprogramming is supported by sequencer 
instructions that singly or in pairs correspond to high-level 
language control constructs. Examples are FOR |: = D DOWN 
TO 1DO...END FOR and CASE N OF... END CASE. The 
instructions have been given high-level language names 
where appropriate. Figure 2 shows how to microprogram 
important control constructs; the high-level language is on the 
left and the microcode on the right. 


Test Conditions 


The condition for a conditional instruction is supplied by a test 
multiplexer, which selects one out of sixteen tests with the 
select lines So - S3. Twelve of these are supplied directly by 
the inputs To — T+4, while the remaining four tests are generat- 
ed by the test logic from the inputs Tg-— 1744. The following 
table shows the assignments. 


(So - S3)H Test - Intended Use 





0-7 To-T7 General 
8 Tg C (Carry) 
9 Tg N (Negative) 
A T1090 V (Overflow) 
B T44 Z (Zero or equal) 
C Tg + 114 C+2Z (Unsigned less 
than or equal, borrow mode) 
D Tg +744 C+2Z (Unsigned less 
than or equal) 
T9 ®@T10 N ®V (Signed less than) 


TM 


(T9 ®T10) +711 (N®V)+Z (Signed less 


than or equal) 


Force Continue 


The sequencer has a force continue (FC) input, which over- 
rides the instruction inputs Io —I5 with a CONTINUE instruc- 
tion. This makes it possible to share the microinstruction field 
for the sequencer instruction with some other control or to 
initialize a writable control store. | 


Reset 


In order to start a microprogram properly, the sequencer must 
be reset. The reset works like an instruction overriding both 
the instruction input and the force continue input. The reset 
selects the address 0 at the address multiplexer, forces the 
EQUAL output to LOW, and disregards a potential interrupt 
request. It synchronously disables the address comparison 
and initializes the stack pointer to 0. The contents of the stack 
are invalid after a reset. . 
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TABLE 1. INSTRUCTION SET for Is5tq4 = 00, 01, 10 


Goto D 

Call D Push INC 
Exit D Pop 

End for D, il - 

End for D, - 


Cond.: Fail 
Y Stack 
00, 10, 20 - 


Goto A - 

Call A Push INC 

Exit A Pop 

End for A, 1 - 

End for A, 1 - 

Goto M : - 

Call M : Push INC 

Exit M : Pop 

End for M, C #1 

End for M, C= 1 

End Loop 

Call Coroutine Pop & 
Push INC 

Return Pop 

End for, C#1 - 

End for, C= 1 Pop 





Cond. = (Test [S| OR I5) XOR 14 
: = Concatination 

C = Counter 

INC = Output of Incrementer = AR +1 (if Cj, = LOW) 


Note: For unconditional instructions, the action marked under ''Cond: Pass" is taken. 


TABLE 2. INSTRUCTION SET for I5l4 = 11 


Continue es 

For D Push INC 
Decrement - 

Loop Push INC 
Pop D Pop 
Push D Push D 
Reset SP SP<0 


For A Push INC 
Pop C Pop 
Push C Push C 
Swap TOS<C 
Push C Load D Push C 
Load D 


Load A - 
Set R<D, Enable 
Clear Disable 





R = Comp. Register 
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interrupts 


The sequencer may be interrupted at the completion of the 
current microcycle by asserting the interrupt request input 
INTR. The return address of the interrupted routine is saved 
on the stack so that nested interrupts can be easily imple- 
mented. An interrupt is accepted if interrupts are enabled and 
the sequencer is not being reset or held (INTEN = HIGH, 
RST = HIGH, and HOLD = LOW). The interrupt-acknowledge 
output (INTA) goes LOW when an interrupt is accepted. 


When there is no interrupt, addresses go from the address 
multiplexer to the Y-bus via the driver, and to the address 
register and the comparator via the interrupt multiplexer. When 
there is an interrupt, the driver of the sequencer is turned off, 
an external driver is turned on, and the interrupt multiplexer is 
switched. The interrupt address is supplied via the external 
driver to the Y-bus, the address register, and the comparator 
(Figure 4). In order to save the address from the address 
multiplexer, the address is stored in the interrupt return 
address register, which for simplicity is clocked every cycle. 
The next microinstruction is the first microinstruction of the 
interrupt routine (Figure 5). 


In this cycle the address in the interrupt return address register 
is automatically pushed onto the stack. Therefore the microin- 
struction in this cycle must not use the stack; if a stack 
operation is programmed, the result is undefined. The instruc- 
tions that do not use the stack are GOTO D, GOTO A, GOTO 
M, CONTINUE, DECREMENT, LOAD D, LOAD A, SET and 
CLEAR. A RETURN instruction terminates the interrupt routine 
and the interrupted routine is resumed. Interrupts only work 
with a single-level control path. 


Traps 


A trap is an unexpected situation linked to current microin- 
struction that must be handled before the microinstruction 
completes and changes the state of the system. An example 
of such a situation is an attempt to read a word from memory 
across a word boundary in a single cycle. When a trap occurs, 
the current microinstruction must be aborted and re-executed 
after the execution of a trap routine, which in the meantime will 
take corrective measures. An interrrupt, on the other hand, is 
not linked directly to the current microinstruction that can 
complete safely before an interrupt routine is executed. 


Execution of a trap requires that the sequencer ignore the 
current microinstruction, select the trap return address at the 
address multiplexer, and initiate an interrupt. This will save the 
trap return address on the stack and issue the trap address 
from an external source (Figure 6). The address register 


contains the address of the microinstruction in the pipeline 
register, thus the address register already contains the trap 
return address when a trap occurs. This address can be 
selected by the address multiplexer by disabling the incremen- 
ter (Cin = 1), and using the force continue mode (FC = 1). In 
this mode the sequencer ignores the current microinstruction. 
The remaining part of the trap handling is done by the interrupt 
(Figure 7), thus the section on interrupts also applies to traps. 
There is one exception, however. The interrupt enable cannot 


’ be used as a trap enable as it does not control the force 


continue mode and the carry-in to the incrementer. 
Hold Mode 


The sequencer has a hold mode in which the operation is 
suspended. 


The outputs (Y, INTA, A-FULL & EQUAL) are disabled and the 
sequencer enters the hold mode immediately after the HOLD 
signal goes active. While the sequencer is in this mode, the 
internal state is left unchanged and the D-bus is disabled. The 
outputs (Y, INTA, A-FULL & EQUAL) are enabled again and 
the sequencer leaves the hold mode after the cycle immedi- 
ately after the HOLD signal goes inactive. 


In a time-multiplexed multi-microprocess system there may be 
one sequencer for all processes with microprogrammed con- 
text save and restore, or there may be one sequencer per 
microprocess permitting fast process switch. In the latter case 
the Y-buses of the sequencers are tied together and connect- 
ed to a single microprogram store. A control unit decides on a 
cycle-by-cycle basis what sequencer should be running, and 
activates the HOLD signal to the remaining sequencers. The 
hold mode has higher priority than interrupts, and works 
independently of the reset. The hold mode can only be used 
with a single-level control path. 


Master/Slave Configuration 


In some systems reliability is very important. The master/slave 
configuration that consists of two sequencers operated in 
parallel is able to detect faults in both the interconnect and the 
internal function of the sequencers. One sequencer is the 
master and operates normally. The other is the slave, i.e., all 
outputs except the signal ERROR are turned into inputs and 
connected to the outputs of the master. Since the slave is 
operated in parallel with the master, it can compare its result 
with the result of the master and: signal an error if they differ. 
The error signal from the master indicates a malfunctioning 
driver or contention. Because a TTL output goes HIGH when 
power is missing, the ERROR signal also indicates power 
failure. 


High-Level Language Constructs 


An example of high-level language constructs using Am29C331 instructions is given in Figure 3 (3-1, 3-2, 3-3, and 3-4). 


REPEAT LOOP 


UNTIL CC END LOOP NOT CC 


WHILE CC DO LOOP 
IF NOT CC THEN EXIT L 


ND LOOP 
L: 


END WHILE 


LOOP LOOP 


IF CC THEN EXIT IF CC THEN EXIT L 
END LOOP ND LOOP 
L: 


Figure 3-1. Loops with Unknown Number 
of Iterations 


PUSH DB 
CASE |OF GOTO M 
0: - A: - 
-, RETURN (TO B) 
At2: - 
-, RETURN (TO B) 
At+4: - 
-, RETURN (TO B) 
At+6: - 
_ -, RETURN 
END CASE BB: 


Figure 3-3. Case Statement 
(with D = Ays5 .. . A4XX00 and 
Mo, 0-3 = Aglilo0 during the 
GOTO M instruction. A;Ag must 
be 00, and X signifies a don’t 
care.) 


FOR CNT: =10 DOWN TO 1 DO FOR D 10 


END FOR 


END FOR 


Figure 3-2. Loop with Known Number of 
iterations . 


IF X THEN 
IF Y THEN 


ELSE 


END IF 
ELSE 
IF Z THEN 


ELSE 


END IF 
END IF 


PUSH DC 

IF NOT X THEN GOTO A 
IF NOT Y THEN GOTO B 
RETURN (TO C) 

B: 


RETURN (TO C) 


A: 
IF NOT Z THEN GOTO D 

, RETURN (TO D) 

D: 


-, RETURN (TO C) 


C: 


Figure 3-4. Double-Nested If Statement 














While executing the inst. at A, the seq Is 


Interrupted and directed to B. A+1 
Stack 


> . 
Executing at A. Executing at B. 







A+1 
A: Continue 
Ael: ... 


6  : Continue 
B+: ... 


B B+1 


AF004191 AF004211 


Figure 4. Am29C331 Interrupt Cycle 1 Figure 5. Am29C331 Interrupt Cycle 2 


A trap occurs at the inst. A, and the seq. is 
directed to B. 


Executing at A. 


A _ : Instruction Trapped By FC = 1. 
Cin = 1. INTR = 1 


A+1: 


hall eo 


; AF004181 
AF004201 


Figure 6. Am29C331 Traps Cycle 1 Figure 7. Am29C331 Traps Cycle 2 





Instruction Set Definition 


Legend: @ = Other instruction P = Test pass 
© = Instruction being described F = Test fail 
CC = (Test [S3- So]) © = Register in part 





Mnemonics Description Execution Example 





BRA__D GOTO D 
Unconditional branch to the address specified 
by the D inputs. The D port must be disabled to 
avoid bus contention. i. 


GOTO A 
Unconditional branch to the address specified 
by the A inputs. 


GOTO Multiway (D15-D4 Mxg3 - Mxo) 
Unconditional branch to the address specified 
by the M inputs concatenated with the D input. 
The lower four bits on the D bus (D3 - Do) are 
replaced by one of the four sets of the four-bit 
multiway branch addresses. The multiway 
branch set is selected by bits Dy and Do while 
bits D3 and Do are "don't cares." 


‘a 
Mf 





GOTO TOS 
Unconditional branch to the addess on the top PF001730 
of the stack. 





00H BRCC_D IF CC THEN GOTO D 
ELSE CONTINUE 
lf CC is HIGH (pass), branch to the address 
specified by D. If CC is LOW (fail), continue. 
The D port must be disabled to avoid bus 
contention. 


044 BRCC_A IF CC THEN GOTO A 
ELSE CONTINUE 51 
lf CC is HIGH (pass), branch to the address 
specified by A. If CC is LOW (fail), continue. 


0814 BRCC_M IF CC THEN GOTO Multiway © 

(Di5 ~D4 Mx3 - Mxo) ‘ss 90 
ELSE CONTINUE pP 

lf CC is HIGH (pass), branch to the address 

specified by D inputs concatenated with the M 91 
inputs. If CC is LOW (fail) continue. The lower 

four bits on the D bus (D3 - Do) are replaced by 

one of the four sets of the 4-bit multiway 92 
branch addresses. The multiway branch set is | 
selected by bits Dy and Do while bits D3 and Do 
are ''don't cares." 


PF001740 





OCH BRCC_S IF CC THEN GOTO TOS 
ELSE 
POP STACK 
CONTINUE | 
if CC is HIGH (pass), branch to the address on | 
the top of the stack. If CC is LOW (fail), pop the | 
stack and continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Mnemonics Description Execution Example 


BRNC__D IF NOT CC THEN GOTO D 
ELSE CONTINUE 
If CC is LOW (pass), branch to the address 
specified by D. If CC is HIGH (fail), continue. 
The D Port must be disabled to avoid Bus 
contention. — 


IF NOT CC THEN GOTO A 

-ELSE CONTINUE 

lf CC is LOW (pass), branch to the address 
specified by A. If CC is HIGH (fail), continue. 


IF NOT CC THEN GOTO Multiway 
(D15-D4 Mx3 - Mxo) 

ELSE CONTINUE 

If CC is LOW (pass), branch to the address 
specified by D inputs concatenated with the M 
inputs. !f CC is HIGH (fail), continue. The lower 
four bits on the D bus (D3 — Do) are replaced by 
one of the four sets of the 4-bit. multiway 
branch addresses. The multiway branch set is 
selected by bits Dy and Do while bits D3 and Do 
are ''don't cares." 


PF001750 
IF NOT CC THEN GOTO TOS 
ELSE 
POP STACK 
CONTINUE 
If CC is LOW (pass), branch to the address on 
the top of the stack. If CC is HIGH (fail), pop the 
stack and continue. 


CALL D 
Unconditional branch to the subroutine 
specified by the D inputs. Push the return 
address (address Reg. + 1) on the stack. The 
D port must be disabled to avoid bus 
contention. 


CALL A 
Unconditional branch to the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. 


CALL Multiway (Dy5-D4 Myg—- Mxo) 
Unconditional branch to the subroutine 
specified by the D inputs concatenated with the 

.multiway inputs. Push the return address 
(Address Reg. + 1) on the stack. The lower 
four bits on the D bus (D3 — Do) are replaced by 
one of the four sets of the 4-bit multiway 
branch addresses. The multiway branch set is 
selected by bits Dy and Do while bits D3 and Do 
are ''don't cares." 


PF001760 
CALL TOS 
Unconditional branch to the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. +1) is then pushed 
onto the stack. 


Note: Opcode numbers are in hexadecimal notation. 
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Mnemonics 


CCC_D 


Description 


IF CC, THEN CALL D 

ELSE CONTINUE 

if CC is HIGH (pass), call the subroutine 
specified by the D inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is LOW (fail), continue. The D port must be 
disabled to avoid bus contention. 


IF CC, THEN CALL A 

ELSE CONTINUE 

lf CC is HIGH (pass), call the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is LOW (fail), continue. 


IF CC, THEN CALL Multiway 

(D415 - D4 Mxg3 - Mxo) 

ELSE CONTINUE 

lf CC is HIGH (pass), call the subroutine 
specified by the D inputs concatenated with the 
M inputs. Push the return address (Address 
Reg. + 1) on the stack. The lower four bits on 
the D bus (D3 — Dg) are replaced by one of the 
four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits D3 and Do are 
“don't cares." 


IF CC, THEN CALL TOS 

ELSE CONTINUE 

If CC is HIGH (pass), call the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. + 1) is pushed onto the 
stack. If CC is LOW (fail), continue. 


IF NOT CC, THEN CALL D 

ELSE CONTINUE 

lf CC is LOW (pass), call the subroutine 
specified by the D inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is HIGH (fail), continue. The D port must be 
disabled to avoid bus contention. 


IF NOT CC, THEN CALL A 

ELSE CONTINUE 

lf CC is LOW (pass), call the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is HIGH (fail), continue. 


IF NOT CC, THEN CALL Multiway 
(D5 -D4 Mxg - Mxo) 

ELSE CONTINUE 

lf CC is LOW (pass), cail the subroutine 
specified by the D inputs concatenated with the 
M inputs. Push the return address (Address 
Reg. + 1) on the stack. The lower four bits on 
the D bus (D3 ~- Do) are replaced by one of the 
four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits D3 and Do are 
"don't cares." 


IF NOT CC, THEN CALL TOS 

ELSE CONTINUE | 

If CC is LOW (pass), call the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. + 1) is pushed onto the 
stack. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


PF001770 


PF001780 

















Mnemonics Description Execution Example 


EXIT__D EXIT TO D 
Unconditional branch to the address specified 
by the D inputs and pop the stack. The D port 
must be disabled to avoid bus contention. 


EXIT TO A | 
Unconditional branch to the address specified 
by the A inputs and pop the stack. 


EXIT TO Multiway (Dy5-D4 Myx3 —-—Mxo) 
Unconditional branch to the address specified 
by the D inputs concatenated with the M inputs 
and pop the stack. The lower four bits on the D 
bus (D3 - Do) are replaced by one of the four 
sets of the 4-bit multiway branch addresses. 
The multiway branch set is selected by bits D 
and Do while Dg and Do are "don't cares.” 


PF001790 
EXIT TO TOS 
Unconditional branch to the address on the top 
of the stack and pop the stack. Also used for 
- unconditional returns. 


IF CC, THEN EXIT TO D 

ELSE CONTINUE 

lf CC is HIGH (pass), exit to the address 
specified by the D inputs and pop the stack. if 
CC is LOW (fail), continue with no pop. The D 
port must be disabled to avoid bus contention. 


IF CC, THEN EXIT TO A , 
ELSE CONTINUE STACK / 
if CC is HIGH (pass), exit to the address 

specified by the A inputs and pop the stack. If 

CC is LOW (fail), continue with no pop. 


IF CC, THEN EXIT TO Multiway 

(Di5 - D4 Mx3 - Mxo) 

ELSE CONTINUE 

lf CC is HIGH (pass), exit to the address 
specified by the D inputs concatenated with the 
M inputs and pop the stack. The lower four bits 
on the D bus (Dg —- Do) are replaced by one of 
the four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Dg while bits D3 and Do are 
"don't cares." 2 





PF001800 
IF CC, THEN EXIT TO TOS 
ELSE CONTINUE 
if CC is HIGH (pass), exit to the address on the 
top of the stack and pop the stack. If CC is 
LOW (fail), continue with no pop. Also used for 
conditional returns. 


Note: Opcode numbers are in hexadecimal notation. 
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Mnemonics Description Execution Example 


XTNC__D IF NOT CC, THEN EXIT TO D 
ELSE CONTINUE | 

lf CC is LOW (pass), exit to the address 

specified by the D inputs and pop the stack. If 

CC is HIGH (fail), continue with no pop. The D 

port must be disabled to avoid bus contention. 


IF NOT CC, THEN EXIT TO A 

ELSE CONTINUE 

If CC is LOW (pass), exit to the address 
specified by the A inputs and pop the stack. If 
CC is HIGH (fail), continue with no pop. 


IF NOT CC, THEN EXIT TO Multiway 
(D5 - D4 Mx3 - Mxo) 

ELSE CONTINUE 

If CC is LOW (pass), exit to the address 
specified by the D inputs concatenated with the 
M inputs and pop the stack. The lower four bits 
on the D bus (D3 - Do) are replaced by one of 
the four sets of the 4-bit multiply branch 
addresses. The multiway branch set is selected 
by bits Dy; and Do while bits D3 and Do are 
“don't cares." | 


PF001810 | 








IF NOT CC, THEN EXIT TO TOS 
ELSE CONTINUE 

If CC is LOW (pass), exit to the address on the 
top of the stack and pop the stack. If CC is 
HIGH (fail), continue with no pop. Also used for 
conditional returns. 





IF CNT #1 THEN CNT: = CNT -1 
GOTO D 

ELSE CNT: =CNT -1 

CONTINUE 
If the counter is not equal to one, decrement | 
the counter and branch to the address. | 
specified by the D inputs. If the counter is equal 
to one, then decrement the counter and 
continue. The D port must be disabled to avoid 
bus contention. 


IF CNT #1 THEN CNT: = CNT -1 
GOTO A 

ELSE CNT: =CNT -1 

CONTINUE COUNTER 

If the counter is not equal to one, decrement 
the counter and branch to the address a, COUNT 
specified by the A inputs. If the counter is equal 

to one, then decrement the counter and COUNTER = 1 ie 
continue. 





iF CNT #1.THEN CNT: = CNT -1 PF001820 
GOTO Multiway (D145 -D4 Mx3 - Mxo) 
ELSE CNT: = CNT -1 

CONTINUE 

If the counter is not equal to one, decrement 
the counter and branch to the address 
specified by the D inputs concatenated with the 
M inputs. The lower four bits on the D bus 
(D3 - Do) are replaced by one of the four sets 
of the 4-bit multiway branch addresses. The 
multiway branch set is selected by bits Dy and 
Do while bits D3 and Do are ''don't cares."' 


IF CNT #1 THEN CNT: = CNT -1 
GOTO TOS 

ELSE CNT: = CNT -1 

POP STACK 

CONTINUE 

If the counter is not equal to one, decrement 
the counter and branch to the address on the 
top of the stack. If the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 





Note: Opcode numbers are in hexadecimal notation. 





2-19 








Opcode 





(I5 — Ig) _ Mnemonics | Description . Execution Example 
034 DJCC_D IF CC AND CNT #1 THEN CNT: = CNT-1 
, GOTO D 
ELSE CNT: = CNT -1 
CONTINUE 


If CC is HIGH (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D 
inputs. If CC is LOW (fail) or the counter is 
equal to one, then decrement the counter and 
continue. The D port must be disabled to avoid 
bus contention. 





07H DJCC_A IF CC AND CNT #1 THEN CNT: = CNT -1 : P ANO COUNTER 
GOTO A COUNTER + 1 
ELSE CNT: =CNT-1- --O--— count-1 
CONTINUE 
If CC is HIGH (pass) and the counter is not 54 F OR 
equal to one, decrement the counter and COUNTER = 1 


branch to the address specified by the A inputs. 
if CC is LOW (fail) or the counter is equal to 
one, then decrement the counter and continue. 


PF001830 
OBH DJCC__M IF CC AND CNT #1 THEN CNT: = CNT -1 
GOTO Multiway (Dy5 -D4 Mx3 - Mxo) 
ELSE CNT: = CNT -1 
CONTINUE 
If CC is HIGH (pass) and the counter is not 
equal to one, decrement the counter and. 
branch to the address specified by the D inputs 
concatenated with the M inputs. The lower four 
bits on the D bus (D3 — Do) are replaced by one 
of the four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits D3 and Do are 
"don't cares." 


OFy DJCC__S IF CC AND CNT #1 THEN CNT: = CNT -1 
GOTO TOS 
ELSE CNT: = CNT -1 
POP STACK 
CONTINUE 
lf CC is HIGH (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address on the top of the stack. If 
CC is LOW (fail) or the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Opcode 
(15 - I) 


13, 


17H 


1BH 


1Fy 


2Ey 


OEY 


1EH 


Mnemonics 


DJNCC__D 


DJNCC_A 


DJNCC_M 


DJNCC_S 


RET 


RETCC 


RETNC 


Description 





IF NOT CC AND CNT #1 THEN 
CNT: = CNT - 1 


.GOTO D 


ELSE CNT: = CNT -1 

CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D 
inputs. If CC is HIGH (fail) or the counter is 
equal to one, then decrement the counter and 
continue. The D port must be disabled to avoid 
bus contention. 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT - 1 

GOTO A 

ELSE CNT: =CNT -1 

CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the A inputs. 
The content of the interrupt return address 
register and the address register is replaced by 
the A address in this case. If CC is HIGH (fail) 
or the counter is equal to one, the current 
address is incremented, appears on the bus for 
continue, and is stored into the above two 
registers. 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT -1 

GOTO Multiway (D145 -D4 Mg —- Mo) 
ELSE CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D inputs 
concatenated with the M inputs. The lower four 
bits on the D bus (D3 - Do) are replaced by one 
of the four sets of the 4-bit multiway branch 
addresses. The muitiway branch set is selected 
by bits Dy and Do while bits Dg and Do are 
“don't cares." 


IF NOT CC AND CNT #1 THEN 


CNT: = CNT - 1 
GOTO TOS 

ELSE CNT: =CNT -1 
POP STACK 
CONTINUE 


lf CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address on the top of the stack. If 
CC is HIGH (fail) or the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 


RETURN | 
Unconditiona} return from subroutine. The 
return address is popped from the stack. 


IF CC THEN RETURN 

ELSE CONTINUE 

lf CC is HIGH (pass), return from subroutine. 
The return address is popped from the stack. If 
CC is LOW (fail), continue. 


IF NOT CC THEN RETURN 

ELSE CONTINUE 

If CC is LOW (pass), return from subroutine. 
The return address is popped from the stack. If 
CC is HIGH (fail), continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


P AND COUNTER 


COUNTER + 1 
--O=— count-1 


COUNTER = 1 


PF001840 


STACK 


Cy Po +) 
V4 
y, 


50 90 
§1 91 
52 92 
53 93 


PF001850 


on 


areal 7 











es Se ee a 


pee 


Tals ssa 








34H 


38H 


35H 


39H 
3AH 


Mnemonics 


-FOR_D 


POP_D 


POP__C 


PUSH__D 


PUSH__C 
SWAP 


_ Description 


INITIALIZE LOOP 

Push the Address Reg. + 1 on the stack, load 
the counter from the D inputs and continue. 
Use with DJUMP__S for FOR ... NEXT loops. 
The D port must be disabled to avoid bus 
contention. 


INITIALIZE LOOP 

Push the Address Reg. + 1 on the stack, load 
the counter from the A inputs and continue. 
Use with DJUMP__S for FOR... NEXT loops. 


INITIALIZE LOOP 

Push the Address Reg. + + on the stack and 
continue. Use with BRCC_S for 
REPEAT ...UNTIL loops, or with XTCC__D 
and BRA_S for WHILE... END WHILE loops. 


Pop the stack and output the value on the D 
outputs and continue. The D port must be 
enabled. 


Pop the stack and store the value in the 
counter and continue. 


’ Push the D inputs on the stack and continue. 


The D port must be disabled to avoid bus 
contention. 


Push the counter on the stack ‘and continue. 


Exchange the counter and the top of stack and 
continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


COUNTER 


PF001860 





STACK 
50 DO 
7 
7 
51 
§2 
STACK 
50 Oo-— D 
7 
r 
§1 
§2 
STACK 
50 
¢ 
7 
51 — == 
COUNTER 
52 


PF001870 


Mnemonics Description 


STACK__C Push the counter on the stack and load the 
counter with the value of the D inputs and 
continue. 


LOAD_D Load the counter with the value of the D inputs 
and continue. The D port must be disabled to 
avoid bus contention. 


LOAD_A Load the counter with the value of the A inputs 
and continue. 


CONT Continue. 
DECR Decrement the counter and continue. 
RESET_ SP Reset the stack pointer and continue. 


Load the comparison register with the value of 
the D inputs, enable the comparator and 
continue. 


Disable the comparator and continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


STACK 


r 
¢ 
= ame 


COUNTER 


COUNTER 
50 O-— O 
4 
7 
51 
52 


PFOO1880 


COUNTER 


7 
¢ 


PF001890 


COMPARE 


4 
7 


PF001900 




















APPLICATIONS 


Address 


| Test Am29C331 CP 


Interrupt 
Vector 


Microprogram 
Memory 


Pipeline Register CP romtecninionnionscteall COCK 
Am29C332 
Inst. ALU 


Reg. 
Status 


BD006221 


Figure 8. Typical Control-Path Architecture For Am29C300 Family 


(Clock to Register Status Outputs of the Am29C332) 


ALU Status » Am29C331 
Register Output Test Inputs 


(Test Inputs to Y Outputs) 


Am29C331 Outputs 


Microprogram Memory Access Time 
Memory Outputs XKX [y 
eye} Register Setup Time 


WF021093 


Figure 9. Cycle Timing Waveform* 


*This waveform shows the timing relationship for the configuration shown in Figure 8. 
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ABSOLUTE MAXIMUM RATINGS 


-65 to + 150°C. 
-55 to +125°C 


Storage Temperature 
(Case) Temperature Under Bias 
Supply Voltage to 
Ground Potential Continuous 
DC Voltage Applied to Outputs For 
High Output State -0.3 V to +Vcc +0.3 V 
DC Input Voltage -0.3 V to +Voc +0.3 V 
DC Output Current, Into LOW Outputs 
DC Input Current -10 mA to +10 mA 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


-0.3 V to +7.0 V 


OPERATING RANGES 


Commercial (C) Devices 
Temperature (Ta) 
Supply Voltage (Vcc) 


0 to +70°C 
+4.75 V to +5.25 V 


Military* (M) Devices 
Temperature (Ta) 
Supply Voltage (Vcc) 


-55 to +125°C 
+45 V to +55 V 


Operating ranges define those limits between which the 
functionality of the device is guaranteed. 


*Military Product 100% tested at Ta = + 25°C, +125°C, and 
~55°C. 


DC CHARACTERISTICS over operating range unless otherwise specified (for APL Products, Group A, 


Subgroups 1, 2, 3 are tested unless otherwise noted) 


Parameter Parameter 
Vcc = Min. 
Output HIGH Volt 
a a Se 
VOL Output LOW Voltage Vcc = Min. 
Vin = Vit or Vit 





Vie Guaranteed Input Logical 
HIGH Voltage (Note 2) 
Guaranteed Input Logical 
LOW Voltage (Note 2) 
Input LOW Current 


Off-State (HIGH Imp . hax 
Output Current 








Voc = Max., 





Static Power Supply Current 
(Note 3) 






lo=0 pA 





Vin = Voc or GND, 








lol =8 mA 
=4 mA 









Vees Max. 
Vo = 0.5 Volts 


on 


En 
rac [aacaar ony [Si 











eee Voc = 5.0 V 
one ee. ioe Capacitance Ta = 25°C pF Typical 
No Load 


Notes: 1. Vcc conditions shown as Min. or Max. refer to the commercial and military Vcc limits. 
2. These input levels provide zero-noise immunity and should only be statically tested in a noise-free environment (not functionally tested). 
3. Worst-case Icc is measured at the lowest temperature in the specified operating range. 
4. Cpp determines the no-load dynamic current consumption: 
loc (Total) = loc (Static) + Cpp Voc f, where f is the switching frequency of the majority of the internal nodes, normally one-half of the clock 
frequency. This specification is not tested. 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range 
A. COMBINATIONAL PROPAGATION DELAYS 


Max. Delay Max. Delay Max. Delay 
18 





Notes: See notes following Table D. 


“This includes using D as select lines for multiway sets. 
**In the slave mode. 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range (Cont'd.) 


B. OUTPUT DISABLE TIME 


29C331 29C331-1 | 29C331-2 
Description 


Reset-to-Address Enable Py Z [ 
Reset-to-Address Disable - 
INTR-to-Address Enable 
INTR-to-Address Disable | 
INTEN-to-Address Enable 4A: be ie 
INTEN-to-Address Disable % 
HOLD-to-Address Enable i 
HOLD-to-Address Disable 

SLAVE-to-Address Enable 

SLAVE-to-Address Disable 

OED-to-Data Enable 

OED-to-Data Disable 

Reset-to-Data Enable 

Reset-to-Data Disable 

SLAVE-to-Data Enable Re : 
SLAVE-to-Data Disable | 
Clock-to-Data Enable 
Clock-to-Data Disable 
HOLD-to-INTA Enable 
HOLD-to-INTA Disable 
HOLD-to-A-FULL Enable 
HOLD-to-A-FULL Disable 
HOLD-to-EQUAL Enable 
HOLD-to-EQUAL Disable 
SLAVE-to-INTA Enable 
SLAVE-to-INTA Disable - 
SLAVE-to-A-FULL Enable 
SLAVE-to-A-FULL Disable 
SLAVE-to-EQUAL Enable 
SLAVE-to-EQUAL Disable 








Notes: See notes following Table D. 








2-27 





SWITCHING CHARACTERISTICS over COMMERCIAL over operating range (Cont'd.) 


C. SETUP AND HOLD TIMES 


ie 29C331 as 29C331-2 
With Respect 
Parameter | 


Data Setup 

Data Hold 

Alternate Data Setup 
Alternate Data Hold 
Multiway Setup 
Multiway Hold 
Address Setup 
Address Hold 
Instruction Setup 
Instruction Hold 
Forced Continue Setup 
Forced Continue Hold 
Test Setup 

Test Hold 

Select Setup 

Select Hold 

Reset Setup 

Reset Hold 

Interrupt Request Setup 
Interrupt Request Hold 
Interrupt Enable Setup 
Interrupt Enable Hold 
Hold Mode Setup 
Hold Mode Hold 
Carry-In Setup 
Carry-In Hold 












































N 


rita 
Te 





= 


Sar = Zo—t 2 
LS Oz 




















3 PP PP PP PP PP PP PP PP OP OP PS PP”P™”m>? 


Notes: 1. (INTR, INTEN)-to-EQUAL is the sum of (INTR, INTEN)-to-Y disable time and Y-to-EQUAL delay 
time. 
2. Ci, = 50 pF; CL =5 pF for Disable Time only. 
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SWITCHING CHARACTERISTICS over MILITARY operating range (for APL Products, Group A, Subgroups 
9, 10, 11 are tested unless otherwise noted) 


A. COMBINATIONAL PROPAGATION DELAYS 

















Notes: See notes following Table D. ; 
*This includes using D as select lines for multiway sets. f 
**In the slave mode. ! 
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SWITCHING CHARACTERISTICS over MILITARY operating range (Cont'd.) 


Notes: See notes following Table D. 


B. OUTPUT DISABLE TIME 





Description 


Reset-to-Address Enable 
Reset-to-Address Disable 
INTR-to-Address Enable 
INTR-to-Address Disable 
INTEN-to-Address Enable 
INTEN-to-Address Disable 
HOLD-to-Address Enable 
HOLD-to-Address Disable 
SLAVE-to-Address Enable 
SLAVE-to-Address Disable 
OED-to-Data Enable 
OED-to-Data Disab 
Reset-to-Data ., 


\-to-INTA Enable 


HOLD-to-INTA Disable 


HOLD-to-A-FULL Enable 
HOLD-to-A-FULL Disable 
HOLD-to-EQUAL Enable 
HOLD-to-EQUAL Disable 
SLAVE-to-INTA Enable 
SLAVE-to-INTA Disable 
SLAVE-to-A-FULL Enable 
SLAVE-to-A-FULL Disable 
SLAVE-to-EQUAL Enable 
SLAVE-to-EQUAL Disable 
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29C331 


Max. Value 


SWITCHING CHARACTERISTICS over MILITARY operating range (Cont'd.) 


C. SETUP AND HOLD TIMES 





29C331 
Parameter With Respect To 


Data Setup 

Data Hold 

Alternate Data Setup 
Alternate Data Hold 
Multiway Setup 
Multiway Hold 
Address Setup 
Address Hold 
Instruction Setup 
Instruction Hold 
Forced Continue Setup 
Forced Continue Hold 
Test Setup 

Test Hold 

Select Setup 

Select Hold 

Reset Setup 

Reset Hold 

Interrupt Request $ 
Interrupt Request Hold 
Interrupt Enable Setup 
Interrupt Enable Hold 
Hold Mode Setup 
Hold Mode Hold 
Carry-In Setup 
Carry-In Hold 





A PP PP PP OP Ph PP PO > 


290331 





Max. Value 
53 Minimum Clock LOW Time 33 ns 
54 Minimum Clock HIGH Time. 28 ns 
Notes: 1. (INTR, INTEN)-to-EQUAL is the sum of (INTR, INTEN)-to-Y disable time and Y-to-EQUAL delay 
time. 
2. C, = 50 pF; C_ =5 pF for Disable Time only. 
3. The status of I5-lg and FC must not be changed during the clock LOW time. 
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Tae ee 











SWITCHING TEST CIRCUIT 


Voc 


So 


f° 


TC003420 


A. Three-State Outputs 


Notes: 1. C_ = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S1, Se, S3 are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpz} test. 
S$; and So are closed while S3 is open for tpz, test. 
4. C_ = 5.0 pF for output disable tests. 
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SWITCHING TEST WAVEFORMS 








DATA 

INPUT 
bat 
th 

TIMING 

INPUT aw 

—_—_—_————— ov 
WFR02970 


Notes: 1. Diagram shown for HIGH data only. Output 
transition may be opposite sense. 
2. Cross hatched area is don't care condition. 


Setup, Hold, and Release Times 


3 V 
SAME PHASE 
INPUT TRANSITION ee 
‘PLH i — 


'PLH eS 


OPPOSITE PHASE ___ 
INPUT TRANSITION 
a 0 V 


WFR02980 





Propagation Delay 







LOW HIGH-LOW 
PULSE 









HIGH-LOW-HIGH 








PULSE 
WFR02790 
Pulse Width 
Enable ° Disable 
3 V 
CONTROL __ 
(INPUT VAY 
Ov 
LZ 
OUTPUT 0.5 V 
NORMALLY ~1.5 V 
LOW 
$3 — ee 
ee von 
OUTPUT 
NORMALLY 15 V ~1.5 V 
Hi 
GH Sy OPEN 05 V 
~O Vv 
WFR02663 


Notes: 1. Diagram shown for Input Control Enable-LOW 
and Input Control Disable-HIGH. 
2. S14, Se, and S3 of Load Circuit are closed 
except where shown. 


Enable and Disable Times 
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Test Philosophy and Methods 


The following points give the general philosophy that we apply 
to tests that must be properly engineered if they are to be 


implemented in an automatic environment. The specifics of 


what philosophies applied to which test are shown. 


1. Ensure the part is adequately decoupled at the test head. 
Large changes in supply current when the device switches 
may cause function failures due to Vcc changes. 


- 2.Do not leave inputs floating during any tests, as they may 
oscillate at high frequency. 


3.Do not attempt to perform threshold tests at high speed. 
Following an input transition, ground current may change by 
as much as 400 mA in 5-8 ns. Inductance in the ground 
cable may allow the ground pin at the device to rise by 
hundreds of millivolts momentarily. Current level may vary 
from product to product. 


4. Use extreme care in defining input levels for AC tests. Many 
inputs may be changed at once, so there will be significant 
noise at the device pins which may not actually reach Vj, or 
ViH until the noise has settled. AMD recommends using 
ViL SO V and Viy 23 V for AC tests. 


5. To simplify failure analysis, programs should be designed to 
perform DC, Function, and AC tests as three distinct groups 
of tests. 


6. Capacitive Loading for AC Testing 


Automatic testers and their associated hardware have stray 
capacitance which varies from one type of tester to 
another, but is generally around 50 pF. This makes it 
impossible to make direct measurements of parameters 
which call for a smaller capacitive load than the associated 
stray capacitance. Typical examples of this are the so- 
called ''float delays,"’ which measure the propagation 
delays into and out of the high-impedance state, and are 
usually specified at a load capacitance of 5.0 pF. In these 
cases, the test is performed at the higher load capacitance 
(typically 50 pF), and engineering correlations based on 
data taken with a bench setup are used to predict the re- 
sult at the lower capacitance. 


Similarly, a product may be specified at more than one 
capacitive load. Since the typical automatic tester is not 
capable of switching loads in mid-test, it is impossible to 
make measurements at both capacitances even though 
they may both be greater than the stray capacitance. In 


these cases, a measurement is made at one of the two 
capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench setup and the knowledge that certain 
DC measurements (Ion, loL, for example) have already 
been taken and are within specification. In some cases, 
special DC tests are performed in order to facilitate this 
correlation. 


7. Threshold Testing 


The noise associated with automatic testing, the long 
inductive cables, and the high gain of bipolar devices when 
in the vicinity of the actual device threshold frequently give 
rise to oscillations when testing high-speed circuits. These 
oscillations are not indicative of a reject device, but instead, 
of an overtaxed test system. To minimize this problem, 
thresholds are tested at least once for each input pin. 
Thereafter, ''hard'’ high and low levels are used for other 
tests. Generally this means that function and AC testing are 
performed at "hard" input levels rather than at Vi_ max. 
and Viy7 min. 


8. AC Testing 


Occasionally parameters are specified that cannot be 
measured directly on automatic testers because of tester 
limitations. Data input hold times often fall into this catego- 
ry. In these cases, the parameter in question is guaranteed 
by correlating these tests with other AC tests that have 
been performed. These correlations are arrived at by the 
cognizant engineer using data from precise bench meas- 
urements in conjunction with the knowledge that certain DC 
parameters have already been measured and 
are within specification. 


In some cases, certain AC tests are redundant since they 
can be shown to be predicted by other tests that have © 
already been performed. In these cases, the redundant 
tests are not performed. 


9. Output Short-Circuit Current Testing 


When performing los tests on devices containing RAM or 
registers, great care must be taken that undershoot caused 
by grounding the high-state output does not trigger parasit- 
ic elements which in turn cause the device to change state. 
In order to avoid this effect, it is common to make the 
measurement at a voltage (Voutput) that is slightly above 
ground. The Vcc is raised by the same amount so that the 
result (as confirmed by Ohm's law and precise bench 
testing) is identical to the Voyt =0, Vcc = Max. case. 


SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM INPUTS OUTPUTS 


MUST BE WILL BE 
STEADY STEADY 


WILL BE 
CHANGING 
FROM H TOL 


MAY CHANGE 
FROMH TOL 


WILL BE 
CHANGING 
FROML TOH 


MAY CHANGE 
FROML TOH 


DON'T CARE; CHANGING; 
ANY CHANGE STATE 
PERMITTED UNKNOWN 


CENTER 
DOES NOT LINE 1S HIGH 
APPLY IMPEDANCE 

“OFF” STATE 





KS000010 
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SWITCHING WAVEFORMS (Cont'd.) 


3.0 V 


VV VYVYVY YY V ¥ ‘BY,V,VA¥.¥A¥A¥¥, 

INPUTS KKK Worn 
WK RXR RX SV 1.5 V BXXNRX 

OV WAAR PSK 











TRRKK RRR f 
SuIeuiE RA RA RSA GAO | 


() v, : 
PARLEY . 





WFRO02990 


CYCLE 1 CYCLE 2 





aie 
@ 


| @ 
2 


Yorr 


INT-VECT BUFFER VECTon VECT ore 
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Interrupt Timing 


Interrupt Request comes from an interrupt-controller register. If reflects the CP 1 to INTR time of 
the interrupt controller. 

. During Cycle 2, there may be contention on the Y-bus if the Y-bus is turned ON before the INT- 
VECT buffer is turned OFF. 

. Refer to Figures 4 and 5 for definition of A and B. 
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SWITCHING WAVEFORMS (Cont'd.) 
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Am29C332 


CMOS 32-Bit Arithmetic Logic Unit 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


Single Chip, 32-Bit ALU 

Standard product supports 110 ns microcycle time 
for the 32-bit data path. It is a combinatorial ALU 
with equal cycle time for all instructions. 

Speed Select supports 80-ns system cycle time 
Flow-through Architecture 

A combinatorial ALU with two input data ports and 
one output data port allows implementation of either 
parallel or pipelined architectures. 

64-Bit In, 32-Bit Out Funnel Shifter 

This unique functional block allows n-bit shift-up, 
shift-down, 32-bit barrel shift or 32-bit field extract. 


@ Supports All Data Types 
It supports one-, two-, three- and four-byte data for 
all operations and variable-length fields for logical 
operations. 
Multiply and Divide Support 
Built-in hardware to support two-bit-at-a-time modi- 
fied Booth's algorithm and one-bit-at-a-time division 
algorithm. 
Extensive Error Checking 
Parity check and generate provides data transmis- 
sion check and master/slave mode provides com- 
plete function checking. 


GENERAL DESCRIPTION 


The Am29C332 is a 32-bit wide non-cascadable Arithmetic 
Logic Unit (ALU) with integration of functions that normally 
don't cascade, such as barrel shifters, priority encoders 
and mask generators. Two input data ports and one output 
data port provide flow-through architecture and allow the 
designer to implement his/her architecture with any degree 
of pipelining and no built-in penalties for branching. Also, 
the simplicity of a three-bus ALU allows easy implementa- 
tion of parallel or reconfigurable architectures. The register 
file is off-chip to allow unlimited expansion and regular 
addressability. 


The Am29C332 supports one-, two-, three- and four-byte 
data for arithmetic and logic operations. It also supports 


multiprecision arithmetic and shift operations. For logical 
operations, it can support variable-length fields up to 32 
bits. When fewer than four bytes are selected, unselected 
bits are passed to the destination without modification. The 
device also supports two-bit-at-a-time modified Booth's 
algorithm for high-speed multiplication and one-bit-at-a- 
time division. Both signed and unsigned integers for all byte 
aligned data types mentioned above are supported. 


The Am29C332 is designed to support 110-ns microcycle 
time standard speed, and 80-ns microcycle time with speed 


select. The device is packaged in a 169-lead pin-grid-array 


package. 


SIMPLIFIED BLOCK DIAGRAM 
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RELATED AMD PRODUCTS 


Am29C01 CMOS 4-Bit Microprocessor Slice 


Am29C116 CMOS 16-Bit Microcontroller 
Am29C323 CMOS 32x 32 Parallel! Multiplier 
Am29325 32-Bit Floating Point Processor 
Am29C325 CMOS 32-Bit Floating Point Processor 
Am29331 16-Bit Microprogram Sequencer 


Am29C516 CMOS 16x16 Multiplier 
Am29C517 CMOS 16x16 Multiplier with Separate I/O 





























CONNECTION DIAGRAM 
169-Lead PGA 
Bottom View 
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DB8 DB9 DB10~ DAI1 DBi2. DA14 DBi6 PBI DB18 DB19 DBZ 


PBO DA9 = 0811 DA12 DAI3 DBi4 ~~ PAI DAi6 DA17 DA19 DA20 


PAO DAS ODA10 GND 0813 DBI5 DAIS VCC DBI17 DA18 0821 


PY3 = PY2 GND Y3 


vec PY1 GNOT GND Y2 Ys 


GND PYO YO  PERR GND Yi 


* This pin is not used 
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PIN DESIGNATIONS 
(Sorted by Pin Names) 
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LOGIC SYMBOL 


Yo “Yg: PY “PY3, MSERR 


LS002911 


ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid 
Combination) is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29C332 at G C B 


. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 
. TEMPERATURE RANGE . 
C = Commercial (0 to + 85°C) 
. PACKAGE TYPE 
G = 169-Lead Pin Grid Array without Heatsink 
(CGX169) | 
—b. SPEED OPTION 
~1= Speed Select 
-2= Speed Select (TBD) 
. a. DEVICE NUMBER/DESCRIPTION 


Am29C332 
CMOS 32-Bit Arithmetic Logic Unit 


Valid Combinations Valid Combinations 


Rees | Valid Combinations list configurations planned to be 


| AM29C332-1 | supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 

products. 
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ORDERING INFORMATION (Cont'd.) 
APL Products 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MiIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Device Class 

d. Package Type 

e. Lead Finish 


AM29C332 /B Zz Cc 


_—- LEAD FINISH 


C = Gold 


d. PACKAGE TYPE 
Z = 169-Lead Pin Grid Array without Heatsink 
(CGX169) 


c. DEVICE CLASS 
/B =Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C332 
CMOS 32-Bit Arithmetic Logic Unit 


Valid Combinations 
AM29C332 /BZC 










Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. 





Group A Tests 


Group A tests include Subgroups 
1, 2, 3, 7, 8, 9, 10, 11. 
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PIN DESCRIPTION 


BOROW _ Borrow (input) 
When HIGH, the Carry In and Carry Out are borrows for 
subtract operations. 


C,2Z,N, V, L Status (Input/Output) 

When the Register Status pin is LOW, these pins give the 
Carry, Zero, Negative, Overflow and Link outputs of the ALU 
where applicable to the instruction being executed. When 
not applicable to the instruction being executed, or when the 
Register Status pin is HIGH, these pins give the outputs of 
the Carry, Zero, Negative, Overflow and Link bits of the 
internal Status Register. In Slave mode, C, Z, N, V and L 
become inputs. 


- CP Clock Input (Input) 
Clocks internal registers (status, Q) at the LOW to HIGH 
transition, provided HOLD input is LOW. 


DAo-DA3; Data Input for DA-bus (Input) 
Data input lines for operand A. 


DBg-DB3; Data Input for DB-bus (Input) 
Data input lines for operand B. 


HOLD Hold (Input, Active HIGH) 
When HIGH, it inhibits the update of the status and Q 
registers. 


lo-'lg Instruction Inputs (Input) 
Used to select the operation to be performed. 


l7-lg Byte Width Inputs (Input) 
Byte width inputs for byte boundary aligned operand 
instructions. Selects the sources for width and position 
inputs for variable field bit operands. If 17 is LOW it selects 
the width input from pins W4- Wo. If l7 is HIGH the width 
input is selected from the internal width register. Similarly if 
ig is LOW it selects the position inputs from pins Ps — Po and 
if HIGH it selects input from the internal position register. 


MCin Macro Status Carry (Input) 
External Carry input. 


MLINK Macro Status Link (Input) 
External link input. 


M/m  Macro/Micro Select (Input) 
When HIGH, selects macro carry and macro link pins as 
input instead of micro carry and micro link from the micro- 
status register. 
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MSERR’  Master-Slave Error (Output) 
When HIGH, this signal indicates that the master's and 
slave's data were not identical. 


OE-Y Output Enable (Input, Active LOW) | 
When OE-Y is HIGH the Y-bus is disabled (three-stated). 


Po-Ps _ Position Inputs (input) 
Position input to select the position of the least significant bit 
of a field. Also indicates the amount by which data is to be 
shifted up (P5 = LOW) or down (Ps = HIGH) or rotated. 


PAg-PA3 _ Parity Input for DA-bus (Input) 
Parity input for operand A on DA-bus (one per byte). 
Even parity is used for the Am29C332. 


PBo-PB3 _~=Parity Input for DB-bus (Input) 
Parity input for operand B on DB-bus (one per byte). 


PERR ‘Parity Error (Input/Output) 
When HIGH, indicates that a parity error was detected on 
the DA or DB inputs. 


PYg-PY3 Parity for Y-bus (Input/Output) 
Parity output for data on Y-bus (one per byte). Even parity is 
used for the Am29C332. In slave mode, PYg - PY3 become 
inputs. 


RS Register Status Mode Pin (Input) 
Selects between ALU status (Register Status = LOW) or 
register status (Register Status = HIGH) on the C, Z, N, V 
and L outputs. 


SLAVE = Slave (Input) 
When HIGH, this pin puts the ALU in the slave mode. All 
output pins become input pins and signals on them are 
compared with the ALU's internally generated results. When 
OE-Y is HIGH, the Yo-Y31 and PYp9-PY3 inputs are 
ignored. When the SLAVE pin is LOW, the ALU is put in 
master mode where outputs are generated as normal. 


Wo-W, Width Inputs (Input) 
Width input to select the width of a contiguous bit field. 


Yo- Y31 Data Out/In Lines (input/Output) 
When OE-Y is LOW and the ALU is in the Master mode, the 
ALU result is enabled on the Y-bus. When OE-Y is HIGH, 
the Y-bus is three-stated. In Slave mode the Y-bus acts as 
external data input. 


DAp-DAg; PA)-PAg PBy-PBy DBy-DB3; 
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Figure 1. Detailed Block Diagram 
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Figure 2. Am29C332 Family High-Performance System Block Diagram 


PRODUCT OVERVIEW 


The Am29C332 is a 32-bit wide, high-performance, non- 
expandable Arithmetic Logic Unit (ALU). It has two 32-bit wide 
input ports (A and B) and one 32-bit wide output port (Y). 
These three ports provide flexibility and accessibility for high- 
performance processor designs. Dedicated input and output 
ports provide a flow-through architecture and avoid the 
penalty associated with switching the bus half-way through the 
cycle for input and output of data. The chip is designed for use 
with a dual-access RAM (Am29C334) as a register file. In 
addition, the three-bus architecture facilitates the connection 
of other arithmetic units in parallel with the Am29C332 for 
high-performance systems. 


The Am29C332 supports one-, two-, three-, and four-byte 
arithmetic operations. It also supports multiprecision arithme- 
tic and multiple bit shifts. For logical operations, it can handle 
variable-length fields of up to 32 bits. The chip incorporates 
dedicated hardware to allow efficient implementation of a two 
bit-at-a-time (modified Booth) multiply algorithm, supporting 
signed and unsigned arithmetic data types. Similarly, hardware 
is provided to support a bit-at-a-time divide algorithm, also 
supporting signed and unsigned arithmetic data types. An 
internal 32-bit register (Q) is used by the multiply and divide 
hardware for double precision operands. For business applica- 


tions, the Am29C332 supports variable-length BCD arithmetic. 


Field logical instructions operate on bit-fields taken from the A 
and B data inputs; they may be of variable width and starting 
position. A is normally the source input and B the destination 
input. In general, destination bits not falling within a specified 
field are passed by the ALU unchanged. Field width and 
position are specified either by direct inputs to the chip, or by 
entries in the status register. There are two kinds of field 
logical instructions — aligned and non-aligned. The first type of 
instruction assumes that source and destination fields are 
aligned and the operation is performed only for bits within the 
specified fields. In the second type of instruction, source and 
destination fields are normally non-aligned. However, it is 
always assumed that one field (either source or destination) is 
least-significant-bit (LSB) aligned. 


lf the destination field is LSB aligned then the source field is 
downshifted in order to make it LSB aligned as well. Down- 


shifting is accomplished by making the 6-bit position input 
equal to the two's complement of the number of places the 
field is to be downshifted. If the source field is LSB aligned 
then it is upshifted in order to align it with the destination. 
Upshifting is accomplished by making the position inputs equal 
to the number of places the field is to be upshifted. Any other 
type of field operation is not allowed. Whenever the field 
crosses the word boundary, the portion not falling within the 
word boundary is ignored. This effect is useful when perform- 
ing operations on fields that overlap two different words. 
Instructions to perform straightforward multiple-bit shifts (ei- 
ther up or down) are also provided. Additionally, it is possible 
to extract a bit-field from a word in one instruction, even if that 
field overlaps a word boundary. 


The power and the flexibility of the processor comes partly — 
from its ability to generate a mask to control the width of an 
operation for each instruction without any overhead. For all 
byte aligned instructions (three quarters of the instruction set), 
the mask is either 1, 2, 3 or 4 bytes wide and is generated from 
the byte width input (lg - 17). For all field instructions the mask 
is of variable width and is generated from the position inputs 
(Pg — Ps) and the width inputs (Wo-W4). Table 1 describes 
the position displacement from the position inputs and Table 2 
the bit field from the width inputs. 


TABLE 1. POSITION INPUTS AND BIT 
DISPLACEMENT 


| mputs 
| Ps | Pa | Pa | Po | Pi | 
0 0 0 0 0 


a Bit Displacement 
p 
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TABLE 2. WIDTH INPUTS AND BIT FIELD 


Bit Field 


| tnputs 
| Wa | We | We | wr | wo | w 
; O 0 0 0 0 


32 
1 
2 





31 
Whenever the width of the operand is less than 32-bits, all 
unselected bits from the inputs of the ALU are passed to the 
output without any modification. Depending upon the instruc- 
tion type, unselected bits are taken from different sources. For 
example in all single operand instructions, bits from the source 
operand (from either A or B input) are passed in unselected bit 
positions. For two operand instructions, bits from the B input 


are passed in unselected bit positions. There are some 
exceptions which are explained in the instruction set section. 


The processor has a 32-bit status register to indicate the 
Status of different operations performed. The status register is 
loaded at the rising edge of the clock with new status unless 
the HOLD signal is HIGH. The bit position for each status bit is 
given in the functional description. The least significant byte of 
the status register holds the six position bits (PRo — PRs). The 
two most significant bits of this byte may be read or loaded but 
are otherwise unused by the ALU. The second byte (bits 8 to 
15) consists of the five width bits (WRo — WRa4) and three read- 
only bits that are a combinational function of other status bits, 
and which indicate useful branch conditions. The third byte 
consists of ALU status bits plus bits for high-speed multiply 
and divide. The most significant byte holds intermediate nibble 


carries for BCD operations. An extract-status instruction is 


provided which allows a Boolean vaiue to be formed from any 
selected bit. This is particularly useful in machines employing a 
stack architecture. Instructions to save and restore the status 
register are provided. As the entire status of each instruction is 
stored in the status register, interrupts at any microinstruction 
boundary are feasible. 


The processor has a 32-bit wide priority encoder to support 
floating-point and graphics operations. The priority encoder 
supports all byte aligned data types - the result is dependent 
upon the byte width specified. The result of a priority encode is 
also loaded into the position bits of the status register. The 
result of the prioritize operation can then be used in the 
following clock cycle, e.g., to normalize a floating-point num- 
ber or to help detect the edge of a polygon in graphics 
applications. 


To support system diagnostics, the Am29C332 has a special 
''Master-Slave'' mode. To use this mode, two chips are 
connected in parallel, and hence receive the same instructions 
and data. The master chip is used for the normal data path. 
However, in the slave chip, all outputs becomes inputs. The 
slave compares the outputs of the master with its own 
internally generated result. If the two do not match, the slave 
will activate an error signal. 


As a further diagnostic aid, byte-wise parity checking is 
performed at both the A and B data inputs. The ''parity"’ signal 
is activated if an error is detected. Parity bits (one per byte) are 
generated for the 32-bit output bus. 


FUNCTIONAL DESCRIPTION 


A detailed description of each functional block is given in the 
following paragraphs. 


64-Bit Funnel Shifter 


The 64-bit funnel shifter is a combinatorial network. The 64-bit 
input is formed from a combination of the A and B inputs. This 
may be left-shifted by up to 31 bits before being used by the 
ALU. The output of the shifter is the most significant 32 bits of 
the result. The 64-bit shifter can be used on either the A or B 
operands to perform barrel shifts (either up or down) or 
rotates. The operation is controlled by positioning operands 
properly at the input of the 64-bit up-shifter. 


The number ''n'’ by which the operand is shifted comes from 
two sources: the microprogram memory via the Po - Ps5 pins or 
the internal register (byte 0 of the status register), PRo — PRs, 
as selected by an instruction bit. 


In general, the 6-bit position input, Po - Ps, takes a 6-bit two's 
complement number representing upshifts from 0 to 31 places 


(positive numbers) or downshifts from 1 to 32 places (negative ~ 


numbers). 
Mask Generator 


The mask generator logic provides the ability to generate the 
appropriate mask for an operand of given width and position. 
The generation of the mask depends upon two types of 
instructions. The first type has byte boundary aligned oper- 
ands (widths of either 1, 2, 3 or 4 bytes) with the least 
significant bit aligned to bit 0. The width of an operand is 
specified by the byte width inputs (lg and |7) as shown in Table 
3. The second type of instruction has operands of variable 
width (1 to 32 bits) and position. The operand is specified by 
the width inputs (Wo —-W4) and the position inputs (Po — Ps) 
indicating the least significant bit position of the operand. 
Thus, in this type of instruction the operand may or may not be 
least significant bit aligned. Depending upon the type of 
instruction, the mask generator first generates a fence of all 
zeros starting from the least significant bit with the width 
specified either by the byte width or the width input fields. This 
fence can be upshifted by up to 31 bits by the 32-bit mask 
shifter. Whenever the mask is moved up over the 32-bit 
boundary, it does not wrap around. Instead, ONE's are 
inserted from the least significant end. This configuration 
provides the ability to operate on a contiguous field located 
anywhere in a word, or across a word boundary. 


The mask generator can be used as a pattern generator by 
allowing the mask to pass through ALU (by using the PASS- 


MASK instruction). For example, a single-bit wide mask can be ~ 


generated and by shifting it up by different amounts can give 
walking ONE or walking ZERO patterns for memory tests. 


TABLE 3. 


eee a (a 
ae i 









Arithmetic and Logical Unit 


The ALU is a three input unit which uses the mask as a second 
or third operand in every instruction. The mask is used to 
merge two operands. For all selected bits (wherever the mask 
is 0), the desired operation specified by the instruction input is 
performed, and for all unselected bits either corresponding 
destination bits or zeros are passed through. The status of 
each operation (carry, negative, zero, overflow, link) applies to 
the result only over the specified width. For all byte aligned 
arithmetic and logical operations (first three quarters of the 
instruction set), the status is extracted from the appropriate 
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byte boundary. For all field operations (last quarter of the 
instruction set), the operand width is assumed to be 32 bits for 
status generation. The ZERO flag always indicates the status 
of all bits selected by the mask. 


The actuai width of the ALU is 34 bits. There are two extra bits 
used for the high speed signed and unsigned multiplication 
instructions. These two bits are automatically concatenated to 
the most-significant end of the ALU depending upon the width 
specified for the operation. Since the modified Booth aigorithm 
requires a two-bit down-shift each cycle, these ALU bits 
generate the two most-significant bits of the partial product. 


The ALU is capable of shifting data down by two bits for the 
multiplication algorithm, up by one bit for the divide algorithm 
and single-bit-up-shifts. 


The processor is capable of performing BCD arithmetic on 
packed BCD numbers. The ALU has separate carry logic for 
BCD operations. This logic generates nibble carries (BCD digit 
Carry) from propagate and generate signals formed from the A 
and B operands. !n order to simplify the hardware while 
maintaining throughput, the BCD add and subtract operations 
are performed in two cycles. In the first cycle, ordinary binary 
addition or subtraction is performed and BCD nibble carries 
are generated. These are blocked from affecting the result at 
this stage, but are saved in the status register to be used later 
for BCD correction (NCo — NC7). In the second cycle all BCD 
numbers are adjusted by examining the previously generated 
nibble carries. Since all the necessary information is stored in 
the status register, the processor can be interrupted after the 
first BCD cycle. 


Priority Encoder 


The priority encoder is provided to support floating-point 
arithmetic and some graphics primitives. The priority encoder 
takes up to 32 bits as input and generates a 5-bit wide binary 
code to indicate location of the most significant one in the 
operand. Input to the priority encoder comes from the input 
multiplexer, which masks all bits that the user does not want to 
participate in the prioritization. The priority encoder supports 8, 
16, 24 and 32-bit operations depending upon the byte width 
specified. For each data type the priority encoder generates 
the appropriate binary weighted code. For example, when a 
byte width of two is specified (l7 -lg = 10), the output of the 
encoder is zero when bit 15 is HIGH. However, if byte width of 
four is specified (Ilg-1!7 = 00), the output of encoder is 16 
(decimal) if bit 15 is HIGH and bits 31-16 are LOW. Table 4 
shows the output for each data type. If none of the inputs are 
HIGH or the most significant bit of the data type specified is 
HIGH, then the output is zero. The difference between these 
two cases is indicated by the Z-flag of the status register which 
is HIGH only if all inputs are zero. 


Q-Register 


The Q-register holds dividend and quotient bits for division, 
and multiplier and product bits for multiplication. During 
division, the contents of the Q-register are shifted left, a bit at 
a time, with quotient bits inserted into bit 0. During multiplica- 
tion, the contents of the Q-register are shifted right, two bits at 


a time, with product bits inserted into the most-significant two 
bits (according to the selected byte width). The Q-register may 
be loaded from the A or B inputs and read onto the Y bus. 


Master-Slave Comparator 


All ALU outputs (except MSERR) employ three-state buffers. 
The master-slave comparator compares the input and output 
of each buffer. Any difference causes the MSERR signal to be 
made true. In Slave mode, all output buffers are disabled. 
Outputs from a second ALU may then be connected to the 
equivalent pins of the first. The comparator in the slave will 
then detect any difference in the results generated by the two. 
When the Y bus is three-stated by making Output-Enable 
false, the Y bus master-slave comparators are disabled. 


Parity Logic 


For each byte of the DA and DB inputs there is an associated 
parity bit (8 in all). If a parity error is detected on any byte, the 
Parity-Error signal is made true. Four parity signals (one per 
byte) are also generated for the Y bus outputs. EVEN parity is 
employed for the Am29C332. 


Status Register 


All necessary information about operations performed in the 
ALU is stored in the 32-bit wide status register after every 
microcycle. Since the register can be saved, an interrupt can 
occur after any cycle. The status register can be loaded trom 
either the A or B input of the chip and can be read out on the Y 
bus for saving in an external register file. For loading, the byte 
width indicates how many bytes are to be updated. The status 
register is only updated if the HOLD input is inactive. 


Each byte of the status register holds different types of 
information (see Figure 3). The least significant byte (bits 0 to 
7) holds eight position bits (PRo -PR7) for the data shifter. 
The two most significant bits are not used. The next most 
significant byte (bits 8 to 15) holds the 5-bit width field 
(WRo - WRa) for the mask generator. The three most-signifi- 
cant bits of that byte {bits 13 to 15) are read-only bits that 
represent three different conditions extracted from the other 
bits of the status register. They are C + Z, N ® V, and (N © 
V) + Z for bits 13, 14 and 15 respectively. These bits can be 
read on the Yo pin by the extract-status instruction. The next 
byte contains all the necessary information generated by an 
ALU operation. The least-significant four bits (bits 16 to 19) 
hold carry, negative, overflow and zero flags. Bit 20 holds link 
information for single bit shifts and bits 21 and 22 are used by 
the multiply and divide instructions. The M flag holds the 
multiplier bit for the modified Booth algorithm or it holds the 
sign comparison result for the divide algorithm. The S flag 
holds the sign of the partial remainder for unsigned division. 
Both the flags (M and S) are provided as a part of the status 
register so that multiply and divide instructions can be inter- 
rupted at microinstruction boundaries. The most significant 
byte of the status register holds nibble carries for BCD 
arithmetic. Since BCD arithmetic is performed in two cycles, 
the nibble carries are saved in the first cycle and used in the 
second cycle. Since all the information is stored, BCD instruc- 
tions are also interruptible at the microinstruction boundary. 
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TABLE 4. 


Highest Priority Encoder 
Active Bit Output 


l7-ig = 00 (32-bit) 
None 

31 
30 
29 
28 












I7-—Ig = 01 (8-bit) 
None 


I7-lg = 10 (16-bit) 
None 
15 
14 
13 
12 


I7-1lg = 11 (24-bit) 
None 
23 

22 

21 

20 


Statuso_7: Position Register 
[eer | ere | pre | pre |e | pre | i | Pr | 

7 6 5 4 3 2 1 0 
Statusg_ 9: Width Register 
Status 43: C+2Z 
Status 4: N@V Read Only 
Status45: (N@V)+Z 

SIGNED | SIGNED | UNSIGNED 
15 14 13 12 14 10 9 8 
Status 46: Carry 
Status17: Negative 
Status 4g: Overflow 
Status 9: Zero 
Statusg0: Link 
Statuso1: Multiply (and divide) Bit 
Statusoo: Sign Flag 
Statuso3: 0 





23 22 21 20 #19 «218 ~«#«47~C*«WNG 


Statuso4_31: Nibble Carries 
31 30 29 28 OF 26 25 24 


Note: Overflow is defined as follows: 
V = (carry in to MSB) ® (carry out of MSB) 





Figure 3. ALU Status Register Bit Assignment 
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Am29C332 INSTRUCTION SET 
Data Types 
The Am29C332 supports the following data types: 


1. Integer 
2. Binary-coded decimal 
3. Variable-length bit field 


The first two data types fall into the category of byte boundary 
aligned operands (Figure 4). The size of the operand could be 
1 byte, 2 bytes, 3 bytes or 4 bytes. Ail operands are least 
significant bit (bit 0) aligned. The byte width is determined by 
bits lg and |7 of the instruction as shown in Table 5. 


TABLE 5. 


Width in 
= 






The third data type has operands of variable width (1 to 32 
bits) as shown in Figure 4. The operand is specified by width 
inputs (Wo —- W,4) and position inputs (Po — Ps). The position 
inputs indicate the least significant bit position of the operand. 
Depending on bits Ig and I7 of the instruction, the width and 
position inputs can be selected from either the Status Register 
or the Width and Position Pins as shown in Table 6. A 
summary of the data types available is illustrated in Table 7. 


VILL 


TB000096 


1 BYTE 










3 BYTES 


Byte Boundary Aligned Operands 


VA... a 


TBO000630 


Variable-Length Bit Field 


p = Bit displacement of the least significant field with re- 
spect to bit 0. 
w = Width of bit field. 


Figure 4. Data Types 


TABLE 6. 


Integer 
1 byte 
2 bytes 


Unsigned 
8 bits -128 to +127 0 to 255 
16 bits ~215 to 
+2'5_4 
~223 to 223_ 4 


3 bytes 24 bits 


4 bytes 32 bits -231 to 2314 


BCD 14 to 4 bytes 
(8 digits) 


Numeric, 2 digits per byte. 
Most-significant digit may be 
used for sign. 

Dependent on position and 
width inputs. 


Variable 1 to 32 bits 





Instruction Format 
The Am29C332 has two types of Instruction Formats: 
1. Byte Boundary Aligned Instructions (FORMAT 1): 


Ig v7 Ig lo 
TBOCCOEgS | 


2. Variable-Length Field Bit Instructions (FORMAT 2): 


Ig ly Ig lg 


TBO00099 





For instructions that aliow a field to be shifted up or down, 
Po-Ps is a two's-complement number in the range -32 to 
+31 representing the direction and magnitude of the shift. For 
instructions that assume a fixed field position, Po - P4 repre- 
sent the position of the least-significant bit of the field and Ps 
is ignored. 
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instruction Classification 
ALU instructions can be classified as follows: 
A. Byte Boundary Aligned Operand Instructions: 


1. Arithmetic 
- Binary, BCD 
- Multiply steps 
- Division steps (single and multiple precision) 


Prioritize 


2. 

3. Logical 
4. Single-bit shifts 
5. 


Data movement 
B. Variable-Length Bit Field Operand Instructions: 
1. N-bit shifts and rotates 
2. Bit manipulations 
3. Field logical operations (aligned, non-aligned, extract) 


4. Mask generation 


Three-fourths of the ALU instructions apply to operands that 
are byte boundary aligned. For these instructions, two orthog- 
onal issues are the width of the operand (in bytes) and the 
contents of the high order unselected bytes on the Y bus. As 
mentioned earlier, the width of the operand is specified by lg 
and I7. With the exception of a few instructions, the unselected 
bytes are assigned values as follows: for single operand 
instructions, unselected bytes are passed unchanged from the 
source (A or B). For two operand instructions, unselected 
bytes are passed unchanged from the destination (B input). 


In the last quarter of the instruction set, the width of the 
operand is from 1 to 32 bits (based on the width input) for field 

. operations, 32 bits for N-bit shift operations and 1-bit for bit- 
oriented operations. In the case of field-aligned and single-bit 
operands, the position bits (Pg-—P,4) determine the least 
significant bit of the operand. In the case of N-bit shifts and 
field non-aligned operands, the position bits Po — Ps is a 6-bit 
signed integer determining the magnitude and direction of the 
shift. 


Flags 

Byte-Aligned Instructions 

The zero flag always looks only at the selected bytes: 
Z « (Y and bytemask (byte width) = 0) 


Similarly, N <« sign bit (Y, byte width), where the function 
"sign-bit'’ returns bit 7, 15, 23, or 31 of the first argument for 
byte widths 01, 10, 11, or 00 respectively. 


Also, C < carry (byte width) returns the carry from the 
appropriate byte boundary, and: 


V = overflow (byte width) = (carry into MSB) © (carry 
out of MSB) 


returns the overflow from the appropriate byte boundary. 


The link (L) flag is generally loaded with the bit moved out of 
the highest selected byte in the case of upshifts, or the bit 
moved out of the least significant byte for downshifts. Figure 5 
shows the shift operation using link bit. Other status flags have 
specialized uses, explained in the following sections. 


Shift Down: 


+— 1, 2, 3, or 4 bytes——> 





DFO006190 


Figure 5. Upshift/Downshift Using Link Bit 


Variable-Length Field Instruction: 


Generally, only N and Z are-affected. N takes the most- 
significant bit of the 32-bit result (i.e, N « Y34). Z detects 
zeros in the selected field of the result (ie, Z «- (Y and 
bitmask (position, width) = 0)). 


Output Select 


The Register Status pin, RS, may be used to switch the C, Z, 
N, V, and L output pins between the direct output of the ALU 
and the outputs of the corresponding bits in the status register. 
If the direct status output is selected, then for instructions that 
do not affect a particular flag (e.g., carry for logical arithmetic) 
that output will reflect the state of its corresponding bit in the 
status register. Similarly, when the HOLD signal is made 
HIGH, the C, Z, N, V and L pins will be made equal to the 
contents of the status register, regardiess of the RS input. 
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INSTRUCTION SET SUMMARY 





Operand Size: Variable Byte Width: 1, 2, 3, 4 Bytes 


_Data Type 


e Increment by one, two, four 

e Decrement by one, two, four 

e Add, addc (carry = macro/micro) 

e Sub, subr 

e Subc, subre (carry/borrow) 

e BCD sum and difference correct steps 


















Binary Integer 
and BCD 


Binary Integer 


Single-Bit e Upshift with 0, 1, link fill 


e Downshift with 0, 1, link, sign fill 
Data 
Movement 


Operand Size: 32 Bits 


Operation Data Type | | 
: : e Upshift by 0 to 31 bits with 0 fill 
bk ied :  Downshift by 1 to 32 bits with 0, sign fill Binary 
e Rotate by 0 to 31 bits | 


Operand Size: Single Bit ; 
Data Type | 









Arithmetic 







e Negate (two's complement) 
e Multiply steps (modified Booth) (Signed and unsigned) 
e Divide steps (non-restoring) 










(Single and double precision) 















e Zero extend 
e Sign extend 
e Pass-status, Q-Reg 
e Load-status, Q-Reg 
e Merge 












Bit e Extract 
Manipulation e Set 


e Reset 


Operand Size: Variable Length Bitfield: 1 to 32 Bits 
Data Type 


Field Logical 
(aligned and 
non-aligned) 















e Not, OR, XOR, AND, extract, insert 
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ZERO-EXTA 
ZERO-EXTB 
SIGN-EXTA 
SIGN-EXTB 
PASS-STAT 
PASS-Q 
LOADQ-A 
LOADQ-B 
NOT-A 
NOT-B 
NEG-A 
NEG-B 
PRIOR-A 
PRIOR-B 
MERGEA-B 
MERGEB-A 


DECR-A 
DECR-B 
INCR-A 
INCR-B 
DECR2-A 
DECR2-B 
INCR2-A 
INCR2-B 
DECR4-A 
DECR4-B 
INCR4-A 
INCR4-B 
LDSTAT-A 
LDSTAT-B 


INSTRUCTION SET GLOSSARY 


(Sorted by Opcode in Hex Notation) 


DN1-OF-A 
DN1-0F-B 
DN1-OF-AQ 
DN1-0F-BQ 
DN1-1F-A 
DN1-1F-B 
DN1-1F-AQ 
DN1-1F-BQ 
DN1-LF-A 
DN1-LF-B 
DN1-LF-AQ 
DNi-LF-BQ 
DN1-AR-A 
DN1-AR-B 
DN1-AR-AQ 
DN1-AR-BQ 


UP1-OF-A 
UP1-0F-B 
UP1-0F-AQ 
UP1-0F-BQ 
UP1-1F-A 
UP1-1F-B 
UP1-1F-AQ 
UP1-1F-BQ 
UP1-LF-A 
UP1-LF-B 
UP1-LF-AQ 
UP1-LF-BQ 
ZERO 
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SUM-CORR-A 
SUM-CORR-B 
DIFF-CORR-A 
DIFF-CORR-B 


SDIVFIRST 
UDIVFIRST 


SDIVSTEP 
SDIVLAST1 
MPDIVSTEP1 
MPSDIVSTEP3 
UDIVSTEP 
UDIVLAST 
MPDIVSTEP2 


MPUDIVSTP3 ~ 


REMCORR 
QUOCORR 
SDIVLAST2 
UMULFIRST 
UMULSTEP 
UMULLAST 
SMULSTEP 
SMULFIRST 





[opeede [wane T oncose [wane TTopense [Wome [Osos [ane 


NB-SN-SHA 
NB-SN-SHB 
NB-OF-SHA 
NB-OF-SHB 
NBROT-A 
NBROT-B 
EXTBIT-A 
EXTBIT-B 
SETBIT-A 
SETBIT-B 
RSTBIT-A 
RSTBIT-B 
SETBIT-STAT 
RSTBIT-STAT 
NOTF-AL-B 
PASSF-AL-B 


NOTF-A 
NOTF-AL-A 


‘PASSF-A 


PASSF-AL-A 
ORF-A 
ORF-AL-A 
XORF-A 
XORF-AL-A 
ANDF-A 
ANDF-AL-A 
EXTF-A 
EXTF-B 
EXTF-AB 
EXTF-BA 
EXTBIT-STAT 
PASS-MASK 














- TABLE 6-1. DATA MOVEMENT INSTRUCTIONS 


EE ES EE PO cere kes career ae 
ee a ee ee oe ae 










[eno-exta [00 | Zoo exend [0 | A 
| oe 
Psianexta | 02 | Sen Bend | Son [| A _ 


02 A 
MERGEA-B | OE | Merge AwithB | B | A Merge B 
| MERGEB-A _ Merge B with A B Merge A 


ein | | _ sent _[te seas men [aTe Tc faTeLaTE 













PASS-STAT Pass Status Register 
LDSTAT-A Load Status Register 
| LDSTAT-B 1D 


Ge ae 


Passo | 05 | Pass OResir | @ [ot | {tt[t— 
uoanGa [06 [teera sis @ fata | ftir) ty. 
Peoasae [of SSsS—ir 


Note: 1. These instructions use the byte aligned instruction format (FORMAT 1). 








Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 

A =A Input 

B =B Input 

Q=Q Register 

+ = Updated only if byte width is 3 or 4 

'* = Updated 
Exampies: - 
2, ZERO. EXTB Pass lower two bytes of B to Y with zero fill on upper two bytes 


0, LOADQ-A Load all four bytes of A into Q Register pass updated Q Resistor to Y 
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TABLE 7. LOGICAL INSTRUCTIONS 


Mnemonics Description pie, __se sf ty N 
i a ee Sears 


NOT-A | 08 | One's Complement 


Pass Zero 
Pass Sign 





Note: 1. These instructions use the byte aligned instruction format (FORMAT 1). 


Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 


A=A Input 

B =B8 Input 

Q=Q Register 

* = Updated 

Examples: 

2, NOT-A Complement low order two bytes of A and output to Y with 
high order two bytes of A uncomplemented. 

1, AND AND first byte of A and B. Output to Y with high three 
bytes of B. 


TABLE 8-1. SINGLE-BIT SHIFT INSTRUCTIONS (SINGLE PRECISION) 


Mnemonics tnd Sl funset| Set sf S| Mz in| c| 


fonera [m [onnnn wor [a fwsxsu rence? {TT Tt 










DN1-0F-B 


aS Downshift, One Fill | A | Yie Ast Ymsb=t | [| [| * |] * |) [* | 
DN1-1F-B 







DN1-LF-B 
| DN1-AR-A | 2C | Downshift, Sign Fill 
DN1-AR-B oes 

UP1-0F-A 
UP1-0F-B 
UP1-1F-A 
UP1-1F-B 
UP1-LF-A Upshift, Link Fil 


Note: 1. These instructions use the byte —--- instruction format PORK 1); 


Coe ae ee 
is 









ie ie 
p[vean vero |p Py 














ee oe | 
[Be | MietieYort. ot ie 







Example: 
2, UP1-1F-A Shift jower two bytes of A up one bit. Set LSB to 1. Fill 
unselected bytes to upper two bytes of A. 
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|B | VieBien Ymp=o | {| [itt |*] | 
|B | Vi=Bisn Ymsp=t | | itt" | [| 







ie Se 


re Pasee est 
Oe eee ee a ae 





TABLE 8-2. SINGLE-BIT SHIFT INSTRUCTIONS (DOUBLE PRECISION) 


Y Output & Q Register | Status 
Mnemonics Code Description 


Selected Bytes 
| DN1-OF-AQ | 


Downshift, Zero Fil | O—3A—>3Q_ 2 
DN1-0F-BQ : 0O>B—>Q 3 sd 
3 


ls | | 

) | | 
) 

oA | ay 
y | | 
OF 

32 

33 

37 

3A 

3B 



















Downshift, Link Fill 






DNi-AR-AQ 
DN1-AR-BQ 


UP1-0F-AQ —_ 
uP1-0F-8Q | 33 | 
UP1-1F-AQ 











Downshift, Sign Fill 










Upshift, Zero Fill 












Upshift, One Fil 





UP1-1F-BQ | 37 | 
UP1-LF-AQ Upshift, Link Fill 
UP1-LF-BQ | 


| Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 
2. Y Unselected byte from A, Q Unselected byte unchanged. 
3. Y Unselected byte from B, Q Unselected byte unchanged. 








Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A=A Input 
B=B Input 
Q=Q Register 
* = Updated 


Example: 
0, DN1-AR-BQ Shift 64 bits (all 32 bits of both B and Q) 
down by one bit. LSB of B fills MS8 of Q. 
MSB of B Set to sign bit (bit N of status register). 


|| __B (32 bits) | Q (32 bits) | 









ont link status bit 
3, UP1-LF-AQ Shift 48 bits (24-bits of A and 24-bits of Q) 
up by one bit. MSB of 24-bit Q fills LSB of A. 
MSB of 24-bit A sets link status bit. LSB of 
Q is filled with original link value. 
VIDA A (24 bits) WD. & (24 bits) 






DFO06200 
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TABLE 9. PRIORITIZE INSTRUCTIONS 


Status 
mete | com | omnis | _vome _PaTeTE Te Tepe 


a ak gt 2 A De 
prions | 00 ir se Oe 


Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 
. Priority also loaded into STATUS <7:0> 
. Refer to Table 4. 










GM 


Legend: A=A Input 
B =B Input 
Q =Q Register 
* = Updated 
Example: 
3, PRIOR~A Value placed on Y is 2 


| 


Assume A is 01001011 00100010 00000000 00000000 


TABLE 10-1. ARITHMETIC INSTRUCTIONS 


| ¥Output | Status 
fate con | mnogtn [ee] 5 RLETRIE 
EES 


[Nea | 08 ref ee [PPP err 


aie I elele le 
pimig 
Increment by Two 
ers 
Increment by Four 
ence 
Decrement by One 


DECR2-A | 14 | Decrement by Two 
DECR2 


-B 
Decrement by Four A Fis 
ee Sie 


Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 

. Borrow, rather than carry, is generated if BOROW is HIGH (borrow = carry). 

. Nibble bits are set by these instructions. NEG-A (or NEG-B) and DIFF-CORR may be used to 
form 10's complement of a BCD number. Use SUM-CORR (for increment) or DIFF-CORR (for 
decrement) to increment or decrement a BCD number. 















w 


A 








A 
A 
A 
A 


Gj Nh 


Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A =A Input 
B =B Input 
Q=Q Register 
* = Updated 


Example: 
2, DECR4-A Decrement lower two bytes of A by 4 
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TABLE 10-2. ARITHMETIC INSTRUCTIONS 


id | vOut | Status 
Description | Unsel | sel SM |t iz] VNC 


0 7 
Panne «(| Ada wih Cary [8 [avere aff] PP 
Te farses 
Te fermes 


















Subtract with Cary { B [A+B+1+¢ 2)6) | | | |*[*[*|*_ 
ee eee ee mre 
Correct BCD Nibbles | A [Corrected A 3) |_| | |*[*[*|*_ 
a |B |ConecteaB 9) | | | [*| i "| 
Correct BCD Nibbles | A [Corrected A 9) |_| | |*|*|* | * 
ae ce ae 





Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). : 

2. BOROW is LOW. For subtract operations, a borrow rather than a carry is stored in STATUS if BOROW is HIGH. 
Carry is always generated for ADD regardless of BOROW. 

3. First, the nibble carries NCg-NC7 are tested. Any nibble carry/borrow that is set to 1 generates ''6'' internally as - 
a correction word and then the correction word is added (SUM-CORR- ) or subtracted (DIFF-CORR- ) from the 
operand. NCg-NC7 are not affected by this operation. 

4, Use SUM-CORR or DIFF-CORR to add or subtract a BCD number. 

5. Use ADDC, SUBC, or SUBRC to perform operations on integers longer than 32 bits. 

6. Carry bit is obtained from MCin if M/m is HIGH. Otherwise, carry is obtained from the C status bit. 


Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A=A Input © 
B= 6 Input 
Q=Q Register 


* = Updated only if byte width is 3 or 4 


Example: 
0, ADD Add two 32-bit two's-complement integers 


TABLE 11-1. DIVIDE INSTRUCTIONS (Aligned Format) 


-1 
oma Description Bir vies Isjm(e{zivi{nicl 
| Signed Divide Steps 


Signed Divide Steps 


Psovrinst [46 | Fist instruction for Signed wee TST VO | 
[sovster [60 | Rerate Stop (#bis-time) «| ~—8 —Sd| ve | || ls 
SOIMIASTI fe se ie el ea ee 
SOMASTE eee 
Unsigned Divide Steps 
UDIVSTEP iterate Step (#bits - 1 times) 
Multiprecision Divide Steps 


MPUDIVSTP3 Used for Unsigned Divide 
Correction Steps 


REMCORR bo eB Correct Remainder After Divide | 
QUOCORR Correct Quotient After Divide ae: ae Pee ee 


TABLE 11-2. EXAMPLE CODING FORM (Signed Division) 





Z 
O 
La 
less 
i 
= 
Ei 
Ly 











ER Am29C332 Y-Out 


op | prancn | Saoct | ‘Sat! [ew | op | wrath | rositon| aw | aN | y-our 
Tent [dT Sd] Sid 2 too SCT CC | 
A A OO 

z 
D2 | sowster 
a 


mt) 


DD 
b 
Bs] 


= / i] @® Lee) 


D 


me) 


=e) 
— re) oo 


eee es 
fo 
Poot [iT i id 
rcc_o| one | z] | | 
pee de 
See ee aR 
ld 
aac GN ERE 


Be) 
iS 


Note: Divisor in A, Dividend in A 
Quotient in Q, Remainder in B 


QUOCORR 
REMCORR 


a 
[2 | SowAsTeR 
cs 
Ce 


» 2) 
pb 


Legend: A=A Input 
B =B Input 


S = Status Register 
Q=Q Register 

R1 = Quotient 

R2 = Dividend 

R3 = Remainder 

R4 = Divisor 
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TABLE 12-1. MULTIPLY INSTRUCTIONS (Aligned Format) 


| G | 7 | | | | peeled ea | status 
- | t 
Name | Code sai Description _ Bytes 'sim{u{z|v{n{c} 


Signed Multiply Steps 


First mulply instruction ea A DM 
Sate eee Iterate step (#bits/2 - 1 tape) 
Unsigned Multiply Steps | 
PUMULFIRST | 68 | Fist muliply instruction ea a a 
umuister | 56 | Werate stop (#bis/2 - 1 steps) cs OT 
Pumutiast [50 | Last muttpy instruction CT dT Odd TdT 


TABLE 12-2. EXAMPLE CODING FORM (Unsigned Multiply) 













Am29C332 Am29C334 


< 
d 
Cc 
= 
elelel= [lela [Ammen 





Ds [aro 
CaP uuMoLFRsT 
Pa [umuster 
a 6m 


oS 


Note: 1. Put ALU output in B. 
2. Multiplicand in A, Multiplier in Q 
Product (HIGH) in B, Product (LOW) in Q 


Legend: A=A Input 

B=B Input 
S = Status Register 
Q=Q Register 

R1 = Multiplier 

R2 = Multiplicand 

R3 = Product (HIGH) 

R4 = Product (LOW) 
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TABLE 13. SHIFT/ROTATE INSTRUCTIONS 


| Status 
eg ee ae 


TNB-OF-SHA | 62 | Field Shift, Zero Fil Yitp=Ai, 0 yee 
reorsns 6s vives TP PT PD 
NB-SN-SHA | 60 | Field Shift, Sign Fil HYitp=A,N =. | | f fet [el 
NB-SN-SHB Yi+p = Bi, N at tics we: ol 
NBROT-A Field Rotate |Yi=Ai-pymodse 9) | | | [*t |* | 
NBROT-8 65 LYi=BG-pymogsa 9) | | | Ist it | 


Notes: 1. These instructions use the field instruction format (FORMAT 2). 
2. "p'' stands for bit displacement from Po-Ps or from PRo-PRs (-32 <p <31). 
lf p is positive, Yp.1 to Yo are equal to the fill bit. 
If p is negative, Y31 to Y31+p+1 are equal to the fill bit. 
3. The sign of the position input is ignored for this instruction and Po-P4 are treated as a positive magnitude for a 
circular upshift. 


Legend: A=A Input 
B=B8 Input 
Q=Q Register 
* = Updated 

















Examples: * 
NB-OF-SHA,,4 Shift A up 4 bits and zero fill 


NB-OF-SHB,,-17 Shift B down 17 bits and sign fill 


*Width field not used 


TABLE 14-1. BIT-MANIPULATION eae aS 


Mnemonics oc wi bla set 


severe [a ne 
Si Fone Papeete TTT 


EXTBIT-A 66 | Bit Extract if p>0, Yo=Ap au 
‘ if p< 0, Yo oa Ap 
EXTBIT-B | if p>o, Yo= Bp 2) | 
EXTBIT-STAT. if p> 7 Yo =Sp 2) | 
if p< 0, Yo=Sp 


Notes: 1. These instructions use the field instruction format (FORMAT 2). 













2. Y31 to Y; are set to zero. ''p'' stands for the bit displacement from Po-P4 or from PRo-PRs. The sign of the position input is 


ignored. 






TABLE 14-2. BIT-MANIPULATION INSTRUCTIONS 
Status 


| merors_|ous| _mmgon_| oom mga | vow TSIEEREIE 


Psevarstat | oC [Suus erst —( s-t os PPP ri 
asrarrstat | €0 | OG 


Notes: 1. These instructions use the Field instruction format (FORMAT a 
2. ''p" stands for the bit displacement from Po-Ps5 or from PRo - PRs. 










Legend: Unsel = Unselected field 
Sel = Selected field 
A=A Input 
B=8 Input 
Q=Q Register 
* = Updated 


Examples: 
RSTBIT-B,,3 3rd bit is set to 0 in B 
EXTBIT-STAT,,-4 4th bit in status register is extracted and 
inverted. 
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Aligned Fields 





LDO000140 


Non-Aligned Fields Case 1: 


le— W —rle P-o| 





If position (Py-Ps) 2 0, A is LSB aligned 
Width (Wp- 4) = 1 to 32 


LDO000151 


Non-Aligned Fields Case 2: 
e— W —>le- P_o! 













WY 


If position (Py-Ps) < 0, B is LSB aligned 
Width (W9-Ws) = 1 to 32 
. LD000161 


Figure 6. Field Logical Operations 


2-62 


¥ Output 
| Description | Unsel 


a ee OOO 


TABLE 15. FIELD LOGICAL INSTRUCTIONS 


PASSF-AL-A Field Pass 
| PASSF-AL-B 


PASSF-A 72 


NOTF-AL-A Field Complement 


NOTF-AL-B 
NOTF-A 


ORF-AL-A 


ORF-A 


XORF-AL-A 
XORF-A 


ANDF-AL-A 
ANDF-A 


EXTF-A 
EXTF-B- 


EXTF-AB 
EXTF-BA 


Legend: 


N 
foe) 


Field OR 3) 
Field XOR 3) 

if p<0, Yi-jp 
Field AND 3) 


Field Extract 4) 5) if p20, Yj= 
if p<0, Yj- 

4) 5) if p20, Yj= 

if p<0, Yj-p 


N 


co 


7A 


NM 
188) 


~ 
QO 


| SE 
ica 


N 





These instructions use the field instruction format (FORMAT 2). 


. p<i<p+w-1. ''p" stands for position displacement from Po-Ps or from PRo-PRs and ‘'w" for the width of the bit field- 


from Wo -Wz4 or WRo-WR,. Whenever p + w > 32, operation takes place only over the portion of the field up to the end of 
the word. No wraparound occurs. — 


. This instruction uses the aligned format (see Figure 6). 
. This instruction uses the unaligned field format (see Figure 6). 


p 20: Case 1 
p <0: Case 2 


. If p is positive, the input is LSB aligned and Y. output aligned at position. 


lf p is negative, the input is aligned at |p| and Y output at LSB. 


. Firstly, the concatenation of A(High Word) and B(Low Word) is rotated by the amount specified by the position (p). If p is 


positive, left-rotate is performed. If p is negative, right-rotate is performed. Secondly, the least significant bits on the Y output 
specified by the width (w) are extracted. 


. Same as 6) except that B input is taken as a high word and A input as a low word. 


Unsel = Unselected Field 
Sel = Selected Field 
A=A Input 
B=B Input 
Q=Q Register 
* = Updated 


For all examples, assume STATUS (7:0) is -7 and STATUS (12:8) is 3. 


1. 0,PASSE-AL-B, 11,20 Pass B to Y and test if Bao to Bao 


are all zero. Set Z status if so. 


B: 11000000000000000101011100110100 


Z set to 1 in this case 


2. 3,XORF-A,, Exclusive-OR bits A7-Ag with bits 


Bo - Ba and output to Yo - Yo. Pass 
B3 - B31 to Y3- Y31. Width and po- 
sition values are obtained from STA- 
TUS(12 : 0). 

A: 0110111000100100001011[100}1 101011 


B: 0001110000101000110010100100 1001] 


Ag-7 @ Bo-0 =Y: 00011100001010001100101001001f107) 


2-63 


sll 








TABLE 16. MASK INSTRUCTION 


_ [Vout [Sate 
tania Description | Unsel | Sel | S| M/t |z iv {Nc | 
Leassmask | re _[ Gowaewee [ox [wires | TT TTT 


Notes: 1. This instruction uses the field instruction format (FORMAT 2). 
2.p <i<p+w-1. "p'' stands for the position displacement and "'w'' for the width of bit field. 











Legend: Unsel = Unselected Field 
Sel = Selected Field 
A=A Input 
B=B8 Input 
Q=Q Register 
* = Updated 


Example: Generates an 8-bit field mask pattern starting from bit position 10. 


31 18 17 10 9 0 
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ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 


Storage Temperature .................ccceeeeeees -65 to + 150°C Commercial (C) Case Devices 

Case Temperature Under Bias (Tc) ......... -55 to +125°C Temperature (TA)......cccccececeeeeeseseeneeeeeees 0 to +70°C 

Supply Voltage to Ground Potential Supply Voltage VCC ..........sceeeees +4.75 V to +5.25 V 
CONMUMUOUS. .525.0cstiobe deeeen sa cawedescciowers -0.3 to +7.0 V 


Military* (M) Devices 


DC Voltage Applied to Outputs ° 
Temperature (TA) .......:ccccceescceereeteenes -55 to +125°C 
for HIGH Output State .............. -0.3 V to Voc + 0.3 V Sipely arias nee +45 V to +55 V 
DC Input Voltage ..........cc cc eece eee e ees =O:3O: V6G F038 V0 eee ; 
DC Output Current, Into LOW Outputs ................. 30 mA *Military product 100% tested at Ta = + 25°C, +125°C, and 
DC Input Current ................cceeee eee -10 mA to +10 mA -55°C. 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


Operating ranges define those limits between which the 
functionality of the device is guaranteed. 








DC CHARACTERISTICS over operating range unless otherwise specified (for A 


ducts, Group A, 
Subgroups 1, 2, 3 are tested unless otherwise noted) 


(Note 1) 


Voc = Min., 
VIN = Vi or Vit 


Voc = Min., 
Vin = Vin or Vit 













Output HIGH Voltage 






s & 4 mA for 


Output LOW Voltage BU 
| Other Pins 













Guaranteed Input Logical HIGH Voltage 
(Note 2) 


Guaranteed Input Logical LOW Voltage 
(Note 2) 
pom Input LOW Current 


VOH 

VIH 

Vit 

NL 

IH Input HIGH Current 

lOZH 

Off State (High !mpedance) Output Current 

iCC | 
Cpp* 


Notes: 1. Vcc conditions shown as Min. or Max...ref 
2. These input levels provide zero-noise irr 
tested). 

3. Worst-case Icc is. measured at the 

4. Cpp determines the no-load dynamic 

loc (Total) = Ioc (Static) + Cpa 

of the clock frequency. 


El ae 


Static Power Supply Current 


(Note 3) MIL 


Power Dissipation Capacitance (N pF Typical 









» Commercial or Military Vcc limits. 
nd should only be statically tested in a noise-free environment (not functionally 


perature in the specified operating range. 
fant consumption: 
re f is the switching frequency of the majority of the internal nodes, normally one-haif 





*This parameter is riot tested. 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range 
A. COMBINATIONAL PROPAGATION DELAYS 


29C332 29C 332-1 - 29C0332-2 
To Max. Delay Max. Delay Max. Delay 
20 18 









PAg - PA3, PBo - PBs 
DAo - DA31, DBo - DB31 
DAg - DA31, DBo - DB31 
DAg - DA31, DBo - DBs} 














iv) 
N 
ie) 
wo 
Nh 
™N 


=) a | =) 
n 


‘ 
~ 
ow 
° 
| 


28 
29 53 38 31 
34 37 30 
PS eee 
z 
z 
2 


Ae 
D 


6 








} 
a 
re 
ae r 


Unit 
ns 


' ns 





ns 


ns 


ns 


ns 


ns 





= 
wn 


n 


192) 


no 


ns 


3 
” 


=) 
wn 


3 
n 


3 
on 


=) 
” 


=) 
~ 


n 


ies) 


Daidi sari sais 
NIENDI DIED ITD 


=) 
i?) 


SWITCHING CHARACTERISTICS over COMMERCIAL operating range (Cont'd.) 


B. SETUP AND HOLD TIMES 


With Respect | 290332 
To 
1 


ee 
37 


30 
Eo. 
a 




























Parameter (Note 1) 







Byte Width Setup 


Byte Width Hold 
Instruction Setup 


| 43 

al 176 
os 17s 
Lede! 

| 47__| Instruction Hold 
| 48 | Width setup | Wo Ws 
| 49 | Width Hold | Wo Wa 
| 50 | Position Setup | o-Ps 
oie! 
| 62 | 
Ee 
ce 
et 
ee 
oa 
Ee /m 
eee /m 
Lr 














Uv 


BOROM 
BOROW 

Macro Link Setup MLINK 
: 
M 










29C332-1 29C332-2 
3 







29C332 29C332-1 | 29C332-2 
Description 


Slave Mode 
Enable Time 









Yo- Y31, PYo=PY3 


Yo-Y31, PYo-# 











Notes: 1. See tim 
hol j 
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SWITCHING CHARACTERISTICS over MILITARY operating range 
A. COMBINATIONAL PROPAGATION DELAYS 


PAo -PA3, PBo - PB3 | 
DAg - DA31, DBo - DB31 
DAo - DA31, DBo - DB34 


DAg - DA31, DBo - DBs} 
| DAo - DA31, DBo - DB31 
DAo - DAgi, DBp - DB3} | MSERR | 


75 


TUS REG 


C,Z,V,N,L 
CZVNL | 


CZ, VN L 57 


5 
C, Z, V,N,L 57 
z 


57 


oi 
N 


HOLD C,2Z,V,N, Lb 
HOLD 
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SWITCHING CHARACTERISTICS over MILITARY operating range (Cont'd.) 





B. SETUP AND HOLD TIMES 


Parameter (Note 1) a 


arte Width Set 
aie With Hol 
oar =o5 a 


















i7- 
l7- 
Instruction Setup 


W4 
W4 





Width Hold 
Position Hold | Po-P5 id 

| 55 | Macro Carry Hold MCin 

| 56 | Macro Link Setup MLINK 

Macro Link Hold 


Macro/Micro Setup 


. | Bo Macro/Micro Hold 













Hold Mode Setup 


Hold Mode Hold 











; 
— 









Max. 
Description Value 


Yo- Y31, PYo-PY3 Output Enable Time 
o- Y31, PYo-PY3 Output Disable Time 


v eer 
Slave Mode 
C, Z, V, N, L PERR Enable Time 
Yo-Y31, PYo-PY3 Slave Mode 
| C, Z, V, N, L PERR Disable Time 
Notes: 1. See timirig diagram for desired mode of operation to determine clock edge to which these setup and 
hold times apply. 
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SWITCHING TEST CIRCUIT 


Vec 


{° 


TC001107 


A. Three-State Outputs 


Notes: 1. Ci = 50 pF includes scope probe, wiring and stray capacitances without device in test fixture. 
2. S1, S2, Sg are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpzy} test. . 
S; and So are closed while S3 is open for tpz, test. 
4. C_=TBD for output disable tests. 


SWITCHING TEST WAVEFORMS 


3 Vv 


LOW-HIGH-LOW 


1.5 V PULSE 


ov 


3V 


HIGH-LOW-HIGH 
15 V PULSE 


OV 
WFRO02970 . WFR02790 


Setup, Hold, and Release Times Pulse Width 


Notes: 1. Diagram shown for HIGH data only. Output transition 
may be opposite sense. 
2. Cross hatched area is don't care condition. 
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SWITCHING TEST WAVEFORMS (Cont'd.) 


Enable Disable 


SAME PHASE 


ous CONTROL __ 
INPUT TRANSITION 


INPUT 


OUTPUT 
NORMALLY 
LOW 


OUTPUT 


OUTPUT 
NORMALLY 
HIGH 


OPPOSITE PHASE ___ 
INPUT TRANSITION 


05 V 


WFRO2980 WERGECEO 


Propagation Delay Enable and Disable Times 


Notes: 1. Diagram shown for Input Control Enable-LOW and Input Control 
Disable-HIGH. : 
2. S1, Se and Sg of Load Circuit are closed except where shown. 





Test Philosophy and Methods 


The following points give the general philosophy that we apply 
to tests that must be properly engineered if they are to be 
implemented in an automatic environment. The specifics of 
what philosophies applied to which test are shown. 


1. 


Ensure the part is adequately decoupled at the test head. 
Large changes in supply current when the device switches 
may cause function failures due to Vcc changes. 


. Do not leave inputs floating during any tests, as they may 


oscillate at high frequency. 


.Do not attempt to perform threshold tests at high speed. 
Following an input transition, ground current may change by 
as much as 400 mA in 5 - 8 ns. Inductance in the ground 
cable may allow the ground pin at the device to rise by 
hundreds of millivolts momentarily. 


. Use extreme care in defining input levels for AC tests. Many 


inputs may be changed at once, so there will be significant 
noise at the device pins that may not actually reach Vj, or 
ViH until the noise has settled. AMD recommends using 
Vit SO V and Vi 23 V for AC tests. 


. To simplify failure analysis, programs should be designed to 


perform DC, Function, and AC tests as three distinct groups 
of tests. 


. Capacitive Loading for AC Testing 


Automatic testers and their associated hardware have stray 
capacitance that varies from one type of tester to another, 
but is generally around 50 pF. This, of course, makes it 
impossible to make direct measurements of parameters 
that call for a smaller capacitive load than the associated 
stray capacitance. Typical examples of this are the so- 
called ''float delays'’ which measure the propagation 
delays into and out of the high impedance state and are 
usually specified at a load capacitance of 5.0 pF. In these 
cases, the test is performed at the higher load capacitance 
_ (typically 50 pF) and engineering correlations based on 
data taken with a bench set up are used to predict the 
result at the lower capacitance. 
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Similarly, a product may be specified at more than one 
capacitive load. Since the typical automatic tester is not 
capable of switching loads in mid-test, it is impossible to 
make measurements at both capacitances even though 
they may both be greater than the stray capacitance. In 
these cases, a measurement is made at one of the two 
capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench set up and the knowledge that certain 
DC measurements (lon; lo_, for example) have already 
been taken and are within specification. In some cases, 
special DC tests are performed in order to facilitate this 
correlation. 


. Threshold Testing 


The noise associated with automatic testing, the long, 
inductive cables, and the high gain of bipolar devices when 
in the vicinity of the actual device threshold, frequently give 
rise to oscillations when testing high-speed speed circuits. 
These oscillations are not indicative of a reject device, but 
instead, of an overtaxed test system. To minimize this 
problem, thresholds are tested at least once for each input 
pin. Thereafter, 'hard'' HIGH and LOW levels are used for 
other tests. Generally this means that function and AC 
testing are performed at "hard" input levels rather than at 
Vit Max. and Viy Min. 


8. AC Testing 


Occasionally, parameters are specified that cannot be 
measured directly on automatic testers because of tester 
limitations. Data input hold times often fall into this catego- 
ry. In these cases, the parameter in question is guaranteed 
by correlating these tests with other AC tests that have 
been performed. These correlations are arrived at by the 
cognizant engineer by using data from precise bench 
measurements in conjunction with the knowledge that 
certain DC parameters have already been measured and 
are within specification. 

In some cases, certain AC tests are redundant since they 
can be shown to be predicted by other tests that have 


_ already been performed. In these cases, the redundant 


tests are not performed. 














SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM INPUTS OUTPUTS 


MUST BE WILL BE 
STEADY STEADY 


MAY CHANGE ane 
FROM H TOL 
FROM H TOL 


may cHaNGE WILL BE 
FROML TOH CHANGING 
FROML TOH 


DON’T CARE; CHANGING; 
ANY CHANGE STATE 


PERMITTED UNKNOWN 


CENTER 
DOES NOT LINE IS HIGH 
APPLY IMPEDANCE 

“OFF” STATE 





CP ’ 
+——__@——+ .@ 


DA;DA v vaVaVaVavay, 
peeoes’ XXXKXKXKXK) Se XXX 

+44) ———__—$ 4 ® 
VA 
0000 TT 000% 
| 2 SSG 

vaVaVaVaVaVaVal 

AAAKKAMAAAK 0.0540 
i 49) 


vote SRRKREREKEKK XXX 
\ ta 






lalg 














lo'g 


















. vava¥aa¥a¥a¥ay, IIIT 

5 XXXAXXAAX A ______XXAK XX 
—G3) 

DAV VVV\\/ WaVaVavavay, 

sono KXXXXKXKKXKKK XXX 


G5) 

win XXXXXXXXXAXAK XXX 
<—— 6) —_——$,__ «6 

mun XXXXXXXXXAAKK XXX 
| 69) 

wm OXXXXXXXAAAAK AX 


—O=F © 
Hon KXXXXAKKAXAKARAAK AKA 


WF023680 





Setup and Hold Timing 
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SWITCHING WAVEFORMS (Cont'd.) 


INPUTS* X 


PERR KXXXKXXXKKY 


© ®OOGO—» 
XXXXAAXXXAKARAAY) 














- © ©OGQOQOHHH® 
Yost | 
2 QOOQOOOFZHOHOO 
C,Z,N,V,L : 
ween XXXXXAAKXXAKAAAAKAY 


ee ae 


Status Recister ‘ ‘ X x ‘ ‘ ‘ x ‘ K 


WF023691 


Propagation Delays (SLAVE = LOW) 


Inputs: PAg - PA3, PBo - PB3, DAgp - DAg1, DBo -DB31, Ilo-—1g, Wo-Wa4, Po-Ps, CP, RS, 


MCin, MLINK, M/m, BOROW, HOLD 
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SWITCHING WAVEFORMS (Cont'd.) 


Yo%31 XXXX | | 
C,Z,N,V,L ‘ ‘ X 

+6» 
PERR ‘ ‘ 


WF023700 


Propagation Delay (SLAVE = HIGH) 





WF023710 





WF023720 


Enable/Disable Il (OE-Y = LOW) 


2-74 


INPUT/OUTPUT CIRCUIT DIAGRAMS 





~ OUTPUT 
Voc DRIVEN INPUT Vis 








IC000861 


1C000871 


C,; * 5.0 pF, all inputs Co = 5.0 pF, all outputs 
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Am29C334 


CMOS Four-Port Dual-Access Register File 


PRELIMINARY 


DISTINCTIVE CHARACTERISTICS 


@ 64x18 Bit Wide Register File 


The Am29C334 is a 64 x 18-bit, dual-access RAM with 
two read ports and two write ports. 

Pipelined Data Path 

The Am29C334 can be configured to support either a 
non-pipelined data path (similar to the Am29334) or a 
pipelined data path. 

Cascadable 

The Am29C334 is cascadable to support either wider 
word widths, deeper register files, or both, 


Built in Forwarding Logic 

The Am29C334 provides simultaneous read/write ac- 
cess to the same address for double pipelined systems. 
Byte Parity Storage 

Width of 18 bits facilitates byte parity storage for each 
port and provides consistency with the Am29C332 
32-bit ALU. 

Byte Write Capability 

Individual byte-write enables allow byte or full word 
write. 


BLOCK DIAGRAMS 


Awal__> 


AR > 





WEa WUC > 


clk, 


QUAL ACCESS 
RAM 


64x 18 





BD003022 


5 << 
(| CiKg = & 


BD007021 


Pipelined Mode 


Publication # Rev. Amendment 
08786 B /0 
Issue Date: December 1987 





pa 


pEeV6culiy 


GENERAL DESCRIPTION 


The Am29C334 is a 64-word by 18-bit dual-access RAM with 
two read ports and two write ports. Two independent, simulta- 
neous accesses are possible and each access can be either a 
read or a write. It is designed to be used in a system that 
requires as many as two reads and two writes in a single cycle. 
The device can be configured to support either a non- 
pipelined data path or a pipelined data path. 


The Am29C334 is also fully compatible with the bipolar 
Am29334. When the device is connected to the pinout 
specified for the Am29334, it will appear as a 64-word by 18- 
bit array without support for pipelined operation. The pipelined 
operation of the Am29C334 is made possible because of the 
availability of unused power pins not required by the CMOS 
part. The pipelined operation is disabled by attaching the PIPE 
pin to Vcc. 


RELATED AMD PRODUCTS 













Am29C325 
Am29331 
Am29C331 
Am29332 
Am29C332 
Am29334 
Am29337 
Am29338 













64 x 18 Four-Port Dual-Access Register File 
16-Bit Bounds Checker 
128 x 9 Byte Queue 
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CONNECTION DIAGRAM 
120 Lead PGA* 


DAO2 DAO4 . DADS DA12. DAt6 LEA WEAC WEAL 
DAO3 DAOS DA07 DAIS. DAIS ARAS AWAS WEAH 
DAO1 GND DA06 PIPE DA11 DAI4 DAI7 ARBS AWB4 
YAOO YAO! YA02 

YAO3 YAO GNDA 

GEA YAO6 YAOS 

YAO? YAO8 YA09 

VCCA YA11  YA10 

YAt2 YA13 GNOA 

YA14 YAIS YAI6 

DBO8 DBO9 DBtS GND YA17 ARB3 AWB3 

DBOS 0811 0812 GND AWBO AWB2 ARB2 


DB06 DB10 DB14 GND 0813 ARB1 AWBI 


CD010320 


*Pins facing up. 


TABLE OF INTERCONNECTIONS 
(Sorted by Pin Name) 


WEaL 
WEgc/CLKg 
WEsBH 
WEsBL 

Yaoo 

Yaot 

Yao2 

Yao3 

Yao . 





TABLE OF INTERCONNECTIONS (Cont'd.) 


(Sorted by Pin No.) 





PAD PAD PAD PAD 
NO. NO. NO. NO. 















































AwA2 
ARA3 
AwWA4 
YBo1 


YBO07 
YB08 
YB1i2 


YB15 
WEspL 


AWB5 
ARA2 
AWA3 
ARA4 
| YBo2 
YBO4 
YBo6 
YBog 
YB11 

| YB13 
YB16 
WEsBH 
LEp 

ARB5 
AWA 
ARAI 
YBoO 
| YBo3 


GNDA 


GNDA 


WEpc/CLKg 


LOGIC SYMBOL 


Daio 
PIPE 
Dpis5 
Dp12 
Dp14 
Dai2 
Dai3 


DAtt 
GND 


GND 
GND 


Dai6 
Da15 
Da14 
ARBO 
DBi7 
DB16 
LEA 

ARA5 
Dai7 
YAOO 
YA03 
OEa 

YA07 
VCCA 
YA12 
YA14 
YA17 






























ARB4 
YAO1 











Deo - Dai7 


Aweo ~ Awas 


Apso Ares | 


Aw Bo 
D_ B13 
WeEac/CLKa 


Aw A5 


<< 





































: 3 errrr er: << 
<«<<«<«ckc<caecx cde a< o 
pase ae aso Ge es] 


RES : m9 


ne 






METALLIZATION AND PAD LAYOUT 


WEAF 
































AL | WE. 
WEs, 
WE,o/CLK, 









me 


Po a oat 










WE /CLK, 


aah 





on 
Ps 










LE, 
OE, 


LE, 
OEs 


Ort aa a 








| 9AM AAs os ns eR 


ad 


she 


Eom 
ree 





27, 3 





LS00222 
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: ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid 
Combination) is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29C334 care G Cc B 


-. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 


. TEMPERATURE RANGE 
C = Commercial (0 to + 70°C) 


. PACKAGE TYPE 
G = 120-Lead Pin Grid Array without Heatsink 
(CGX120) 





. SPEED OPTION 
~1= Speed Select 


a. DEVICE NUMBER/DESCRIPTION 
Am29C334 
CMOS Four-Port Dual-Access Register File 


Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released valid combinations, 
and to obtain additional data on AMD's standard military 
grade products. 







Valid Combinations 
AM29C334 GC, GCB | 
AM29C334-1 
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ORDERING INFORMATION (Cont'd.) 
APL Products __ 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 


c. Device Class 
d. Package Type 
e. Lead Finish 
AM29C334 /B Z EC 
—_—— LEAD FINISH 
C = Gold 


d. PACKAGE TYPE 
Z = 120-Lead Pin Grid Array without Heatsink 
(CGX120) 


c. DEVICE CLASS 
/B =Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C334 
CMOS Four-Port Dual-Access Register File 


, Valid Combinations Valid Combinations 
Valid Combinations list configurations planned to be 


supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. | 


Group A Tests 


Group A tests consist of Subgroups 
1, 2, 3, 7, 8, 9, 10, 11. 
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PIN DESCRIPTION 


Arao-~Aras Read Address A-Side (Input) 
The 6-bit read address input selects one of the 64 memory 
locations for output to the Ya Data Latch. 


Arspo~Arsps5 Read Address B-Side (input) 
The 6-bit read address input selects one of the 64 memory 
locations for output to the Yg Data Latch. 


Awao-Awas_ Write Address A-Side (Input) 
The 6-bit write address input selects one of the 64 memory 
locations for writing new data from the Da input. 


Awso-Awess Write Address B-Side (Input) 
The 6-bit write address input selects one of the 64 memory 
locations for writing new data from the Dp input. 


Dao-Dai7z Data A-Side (Input) 
New data is written into memory from this input, as selected 
by the Awa address input. 


Dgo-Dpi7 Data B-Side (Input) 
New data is written into memory from this input, as selected 
by the Aws address input. 


GND, Vcc Power 
Power supply for the internal logic (0, 5 V). 


GNDA, Vcca Power 
’ Power supply for the output drivers (0, 5 V). 


LE~ Ya Data Latch Enable (Input, Active HIGH) 
The LE, input controls the latch for the Ya output port. 
When LEa is HIGH, the latch is open (transparent) and data 
from the RAM, as selected by the Ara address inputs, is 
passed to the Ya output. When LE, is LOW, the latch is 
closed and it retains the last data read from the RAM. LEa is 
disabled in the pipelined mode. 


LEg Yg Data Latch Enable (Input, Active HIGH) 
The LEg input controls the latch for the Yg output port. 
When LEg is HIGH, the latch is open (transparent), and data 
from the RAM, as selected by the Arp address inputs, is 
passed to the Yp output. When LEg is LOW, the latch is 
closed and it retains the last data read from the RAM. LEa is 
disabled in the pipelined mode. 


OE, Ya Output Enable (Input, Active LOW) 
When OE, is LOW, data in the Ya Data Latch is driven on 
the Ya output. When OE, is HIGH, Ya output is in the high- 
impedance (off) state. 


OEg Yg Output Enable (Input, Active LOW) 
When OEg is LOW, data in the Yg Data Latch is driven on 
the Yg outputs. When OEg is HIGH, Yg output is in the high- 
impedance (off) state. 


PIPE Pipeline Enable (Input, Active LOW) 
When PIPE is LOW, the input and output registers are 
enabled, allowing for pipelined operation. When HIGH, 
these registers are made transparent. 


WEac/CLKa Write Enable A-Side Common (input, 
Active LOW) 
When WEac is LOW together with WEay or WEa,, new 
data is written into the location selected by the AWa 
~ address. When WE~ac is HIGH, no data is written into the 
RAM through the A port. WEac acts as a clock input in the 
pipeline mode for the A side. 


WEpBc/CLKg’ Write Enable B-Side Common (Input, 
Active LOW) 
When WEgc is LOW together with WEgy or WEpL, new 
data is written into the location selected by the AWsp 
address. When WEgc is HIGH, no data is written into the 
RAM through the B port. WEgc acts as a clock input in the 
pipeline mode for the B side. 


WEan High-Byte Write Enable A-Side (Input, Active 
LOW) 
When WEan is LOW together with WEac, new data is 
written into the high byte of the location selected by the 
AW~a address input. When WE ay is HIGH, no data is written 
into the high byte. 


WEsH_sCHHigh-Byte Write Enable B-Side (Input, Active 
LOW) 
When WEpy is LOW together with WEgc, new data is 
written into the high byte of the location selected by the 
AWs address input. When WEgp is HIGH, no data is written 
into the high byte. . 


WEaL Low-Byte Write Enable A-Side (Input, Active 
LOW) 

When WEa, is LOW together with WEac, new data is 
written into the low byte of the location selected by the AWa 
address input. When WEa,_ is HIGH, no data is written into 


the low byte. 


WEg._ Low-Byte Write Enable B-Side (Input, Active 
LOW) 
When WEgs, is LOW together with WEgc, new data is 
written into the low byte of the location selected by the AWp 
address input. When WEB, is HIGH, no data is written into 
the low byte. 


Yao-Yaiz7 Data Latch (Outputs, Three-State) 
The 18-bit Ya Data Latch outputs. 


Yspo-Yspiz7 Data Latch (Outputs, Three-State) 
The 18-bit Yg Data Latch outputs. 
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another. In the pipelined mode, these clock rates must have a 
known relationship between each other. 


FUNCTIONAL DESCRIPTION 


The heart of the Am29C334 is a high-speed 64-word by 18-bit 
dual RAM cell array. Six write enables permit the RAM word to 
be written in one or both of its 9-bit bytes. Data to be written is 
presented to each side of the RAM array through the two data 
ports (Da and Ds). 














































In the non-pipelined mode, there is no need for a relationship 
between the clock rates. Two special cases of operation arise 
because of this. The first is where the location written to by 
one side is being read from the other side. In this case, known 
as A-to-B transparency, the value read is the value being 


The remainder of the logic surrounding the RAM array written. The second occurs when two writes to the same 
supports pipelining the RAM access and providing a forward- location occur at the same time. in this case the value written 
ing path for data around the RAM. This forwarding path is can not be defined, but the operation is not harmful to the 
needed to eliminate the latency cycle associated with consec- device. 

utive write/read accesses to the same memory location in a 


Ba. The transparency mode (A-A or B-B) during a write 
pipelined system. (WE, = LOW) allows the data in (Da) to not only be written 


Pipelining of the RAM is controlled by the PIPE pin. When not into memory, but also to appear at the output (Ya) when the 
asserted (i.e., in non-pipelined mode) the registers on the output latch (LE) is HIGH and the output enable control 


inputs (write ports Dap, write addresses Awa;p, and write (OEa) is LOW. 
enables WE,c/gpc) are made fully transparent, while the Extensions to Four Read Ports and Two Write 
registers at the outputs (the read ports Ya,/p) are turned into Ports 


latches, controlled by the latch enables LE, g. 
A RAM with four read ports and two write ports can be made 


In either mode of operation, each side of the RAM is controlled by using two dual-access RAMs and connecting each of the 
by its individual control signals. This means that the two sides write ports, write addresses, and write enables in parallel for 
of the RAM can operate at different clock rates to one the two devices. Figure 2 details this in a non-pipelined mode. 













Am29C334 
ali heady REGISTER 
16-BIT GIST 
SEQUENCER 64 x 18 


MICROPROGRAM 
MEMORY 























| Am29C325 Am29C332 Am29C323 
PIPELINE 32-BiT 32-BIT 32 x 32 
REGISTER FLOATING POINT ALU PARALLEL 


PROCESSOR MULTIPLIER 


CONTROL 
SIGNALS 
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Figure 1. Am29C300 CMOS Family High-Performance System Block Diagram 





32 Word x 36 Bit Single-Access RAM Non-Pipelined Data Path 


It is possible to convert the 64 word x 18 bit dual-access RAM In non-pipelined mode (PIPE = 1), the Am29C334 is a flow- 
into a 32 word x 36 bit single-access RAM. This is performed through device; data is read out, used, and written back all in 
by storing the upper half of the 36 bits in the upper half of the the same cycle. In this mode all the registers are made 
64 words and addressing these from the A side, and storing transparent except the registers at the two read ports that are 
the lower half of the 36 bits in the lower half of the 64 words configured as latches. The read port latches are controlled 
and addressing these from the B side. This arrangement does individually by the LEa and LEg, so that they are transparent 
not change the capacity of the RAM, but the dual access is when the latch enables are HIGH and retain the data when the 
lost (see Figure 4). latch enables are LOW. The "forwarding logic'’ incorporated 

to support the pipelined mode of operation is also disabled in 
Operational Modes this mode of operation (specifically, the address comparators 


are disabled). 
The Am29C334 may be configured in a non-pipelined mode or 
in a pipelined mode by controlling the PIPE pin. This mode is In the non-pipelined mode of operation it is possible to 
selected via hardwiring the pin to either LOW or HIGH. This simultaneously read two ports, read one port and write to the 
option should not be changed during operation. other, or write to two ports, concurrently. The read and write 
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addresses are internally multiplexed on each side. The selec- 
tion of the read and write addresses is controlled by the 
exclusive-OR of the PIPE pin and WEac;pc. Normally, the 
WEac/Bc are connected to the system clock. With PIPE de- 
asserted, the read address will be selected in the high part of 
the clock cycle (WEac/pc = 1) and the write address selected 
only in the low part. Byte selection for writing on either ports is 
controlled by the WEy/L pins. 


Two interesting cases arise as a result of the dual access 
capability. The first occurs if a location is written into by one 
side while it is being read out by the other side. In this case, 
known as A-to-B transparency, the data being written will 
appear on the read port after the Transparencyap time (if 
other read access time parameters are met). The second case 
of interest occurs if both sides write to the same location at the 
same time. The value written as a result of this operation 
cannot be defined. | 


Pipelined Data Path 


The Am29C334 can be configured in a pipelined system by 
asserting the PIPE signal (PIPE = 0) and adding an additional 
external register in the write address and the write control path 
on both A and B ports as shown in Figure 3. The registers on 
each side are controlled by separate clocks that are supplied 
over the WEac and WEgc pins. 


Typically, in a pipelined system a read - modify - write would 
span three cycles. In the second half of the first cycle, a read 
of the operand(s) is performed and the data is clocked into the 
output registers at the end of the cycle. In the second cycle, 
the operation is performed on the operands and the result is 
clocked into the data register on the write port at the end of 
the second cycle. In the first half of the third cycle, the data is 
written to the register file. Therefore, in any cycle, a pipelined 
system is writing the result of instruction n (in the first half), 





Figure 2. RAM with Four Read Ports and Two Write Ports for Non-pipelined Mode 


executing instruction n + 1, and reading the operands needed 
in instruction n + 2. In any case, a write operation followed by 
a read operation is performed in the RAM in a cycle. 


A special case arises if the data to be written by the previous 
instruction is needed in the next instruction as an operand. 
Due to the pipeline register being at its write port, the location 
is not written into until the next cycle, and hence only the 
previous value is available in the current cycle. To overcome 
this problem, "forwarding logic’ is included as shown in the 
block diagram. This logic consists of three elements: an 
address comparator, an AND gate, and a three-to-one multi- 
plexer, as shown. If the read address of the current instruction 
is the same as the write address of the previous instruction, 
and if the result is to be written, then the data to be written is 
forwarded by the forwarding multiplexer to the output regis- 
ters. Since there are two write ports, forwarding paths on both 
ports are provided. As each write port has byte write capability, 
the forwarding is further broken into the upper and lower 
bytes. 


Since each side has its own WEc/CLK control, it is possible to 
clock each side of the chip differently. However, if the part is 
used at different frequencies, the forwarding cannot be 
guaranteed unless the addresses compared are held valid 
long enough to allow for a comparison to be made and the 
results of the forwarding setup on the output register. 


As mentioned earlier, it is necessary to use an external write 
address and write control registers in a pipelined system. 
These registers have not been included for two reasons. First, 
it is possible for the user to abort the writing before it fills the 
internal pipe. This situation may arise in cases such as in 
"traps." Second, by providing an external write address 
register it provides the flexibility of obtaining the write address 
from several sources by using an external multiplexer. 
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Figure 3. System Diagram With the Am29C334 in a Double Pipelined Data Path 
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Figure 4. 32x 36 RAM (Single Access) Using 64x 18 Dual-Access RAM 
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ABSOLUTE MAXIMUM RATINGS | OPERATING RANGES 


Storage Temperature . -65 to +150°C Commercial (C) Devices 
Temperature Under Bias - Tc -55 to +125°C Temperature (Ta) 0 to +70°C 
Supply Voltage to Ground Potential .— Supply Voltage + 4,75 to +5.25 V 
Continuous -0.3 to +7.0 V 
DC Voltage Applied to Outputs | 
for HIGH Output State -0.3 V to +Voco + 0.3 V 
DC Input Voltage -0.3 V to +Vcoc + 0.3 V 
DC Output Current, Into LOW Outputs Operating ranges define those limits between which the 
DC Input Current -10 mA to +10 mA functionality of the device is guaranteed. 


Stresses above those listed under ABSOLUTE MAXIMUM * NAG: ° a 0 ° 
RATINGS may cause permanent device failure. Functionality siete PQA UNGD eevee ered Cashier cane 
at or above these limits is not implied. Exposure to absolute 

maximum ratings for extended periods may affect device 

reliability. 


Military* (M) Devices 
Temperature (Ta) -55 to +125°C 
Supply Voltage (Vcc) +4.5 to +5.5 V 


DC CHARACTERISTICS over operating range unless otherwise specified (for APL Products, Group A, 
Subgroups 1, 2, 3 are tested unless otherwise noted) 


Parameter Parameter Test Conditions 
Symbol Description _(Note 1) 
Vcc = Min. 
Output HIGH Voltage Vin = Vit or Vin 2.4 Volts 
IoOH =-4 mA 


Voc = Min. 
VOL ; Output LOW Voltage Vin = Vit or Vin Voits 
lol =8 mA 
Guaranteed Input Logical 
Guaranteed Input Logical 
Vcc = Max. 
Vcc = oa 





Off State (High-lmpedance) — =24V 
Voc = Max. 
Vin = Voc or GND T= 55 to 125°C 
Icc Static Power Supply Current Voc = Max 
; lo =0 pA Ta =0 to +70°C . 
Power Dissipation Capacitance Voc = 5.0 V 
(Note 3) Ta= 25°C No Load 200: Br pice 


Notes: 1. Vcc conditions shown as Min. or Max. refer to the commercial (+5%) Vcc limits. 
2. These input levels provide zero-noise immunity and should only be statically tested in a noise- -free environment (not functionally 
tested). 
3. Cpp determines the no-load dynamic current consumption: 
Icc (Total) = loc (Static) + Cpp Vcc f, where f is the switching frequency of the majority of the internal nodes, normally one-half 
of the clock frequency. This specification is not tested. 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range unless otherwise specified 


NON-PIPELINED MODE (Note 1) | 2 


ee ee ee ese 


pomewsfaemn 1 fel fal fa 
WEac or WEgc to = 
| 2 | Access Time | WEAC or WEac to LEA a =e é & 2 - e : a 
Turn-On Time OE, or OEg | to Ya 0 
or Yp Active 


OE, or OEg t to Ya 
Turn-Off Time or Yp = High 
Impedance 


pe Enable Time LE, or LEg t to Ya 
or Yg 
or YB 


[a | Bata Setup Time [Da or Op © WE or WET 
10 Da or Dg to WEa or WEg 1 
11 Awa or Awe to WEa or WEp | 
12 Awa or Aws to WEa or WEg 1 
13 Aor aee toenorten 1 
14 Ara or Arp to LEa or LEp | 


15 Latch Close Before Me sor Lenco WE, or WEp ! 
Write 
16 shes Belore Eater WEac or Wegc to LE, or LEg |: 


17 Write Pulse Width WE, or WEgp (LOW) 


Latch Data Capture 
18 Pulse Width LEa or LEg (HIGH) 


Notes: See notes following Military table. 






(Note 2) 
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SWITCHING CHARACTERISTICS over MILITARY operating range unless otherwise specified (for APL 
Products, Group A, Subgroups 9, 10, 11 are tested unless otherwise noted) 






NON-PIPELINED MODE (Note 1) 


or eee Test Conditions rT as. 


ee Se a ee 
; ; OE, or OEg ! to Ya or 
a Turn-Off Time OE, or OEg 1 to Ya or 25 
. (Note 2) YpB = High Impedance 
| LE, or LEg =H, 
Ara or Arp to WEac or 
Zz Write Recovery Time WeEac 


ead a 

[Baia Hold Time | On or Dpto WEaorWEst ——Ss=~—~—sYS 
[ai Aderess Setup Tine | Awa or Aya to WEn or WEpiSSSSC~idCi‘ 
[12 | Adsress Hols Tine | Awa or Awa to WEn or Wet —SSSCS~C~idCSCia CS 
[18 [Adress Setup Time | Ana or Ang to Ea orlEgi —SSCSC~“~*~“‘~dtCSC*ia 
[Aderess Hold Time | Ana or Ang to Eortess SSSC~dC 
a 


Write Pulse Width WE, or WEg (LOW) 


Latch Data Capture 
Pulse Width . LEa or LEg (HIGH) ie | 


Bi coneeet LE, or LEg to WEa or WE | | 
rite ; 
16 Read Before Latch WEac or WEgc to LE, or LEg | 
Close 


Notes: 1. WEA = WEac + WEaAL/H 
_ WEsp = WEgc + WEBL/H 
2. Ya and Yg are tested independently. 
3. Minimum delays are not tested. 
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SWITCHING WAVEFORMS 
NON-PIPELINED MODE 


e (5) 
| XXXXXKK 


ae 


WF023330 


Read Function (* means A or B) 


noe, Cael 
cK rr 


OK KKX SYXKKKKKKKYE 


WF023340 





Write Function (* means A or B) 


WE ct WE. w 





WF023320 


Transparency 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range (Cont'd.) 
PIPELINED MODE 


a pwsrton [or [on | wn | | [ | 
13 










Write Data Setup Time | Da or Dg to CLKa or CLKgt | 15 ah | 12%) _ 
FeO 4 Write Data Hold Time Da or Dp to CLK, or CLKp 1 ee oe 


stl pocorn Awa or Aws to CLKa or CLKg t 


Write Address Hold Awa or Awe to CLKa or CLKg 1 


Time 


. Enable Setup WExy or WE, to CLKa or CLKg + 


1 

3 
24 Write Enable Hold Time | WE or WE, to CLKa or CLKg 1 
2 


5 lee Address Setup Ara or App to CLKa or CLKp 1 
ime 
Bias ASGIESS HOG Ara or App to CLKa or CLKp 1 


| Minimum Clock Cycle _ | CLKa or CLKg (LOW) 
CLKa or CLKg (HIGH) 
CLKa or CLKg (LOW) 17 & 142s ae 14 


2 
2 


=) 
77) 


S 
” 


27 
29 
30 
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SWITCHING CHARACTERISTICS over MILITARY operating range (Cont'd.) 
PIPELINED MODE 


29C334 
Parameter Description | min, | Max. | 


[19 | Wie Data Soup Time | Oa or Dy to Gika or ike; | | —~idt rs 
[Read Adress Hold Time | Ana or Ang to GiKa or Gike + | 0 
: 
7 
: 
18 


3/57/95 /5 4], 5 =) 
DAIL NI NI NI NID 


=) 
n 


Glock io Vd Ya or Yp to Cikq or ike Cit 


2 
2 
2 
2 
2 
2 
2 


0 
1 
2 
3 
4 
26 
7 
8 
29 
30 


3 
n 




















SWITCHING WAVEFORMS (Cont'd.) 
PIPELINED MODE 


KKK KKK KKK _XXXKKKKK 


2 . : 


OXKKKKKKKKKK_KXKKKKKKKK 


AA} 
AYA 


WF023310 


* 


means A or B 





3.0 V TRARAAAAAY 
INPUTS WKY IN Nsecoceraceenae 
QR iil 15 V 15V RNY 
at MNSEAEETNIAN 7 DYSSXY YY 
3.0 V 
CLOCK 


OV 


Sines OO Y/ 
| RY 
DESL YY \ 
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Notes: 1. 


2. 


3. 


SWITCHING TEST CIRCUIT 


Vec 


c 


TC003424 


CL = 50pF includes scope probe, wiring and 
stray capacitances without device in test fixture. 
$1, Se, S3 are closed during functions tests 
and all AC tests except output enable tests. 

S; and S3 are closed while So is open for 
tpzH test. S; and So are closed while Sq is 
open for tpz, test. 


. C. = TBD for output disable tests. 





KEY TO SWITCHING 


WAVEFORM 


INPUT/OUTPUT CIRCUIT DIAGRAMS 


DRIVEN INPUT 


ICO000861 


C; © 5.0 pF, all inputs 
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INPUTS 


MUST BE 
STEADY 


MAY CHANGE 
FROMH TOL 


MAY CHANGE 
FROML TOH 


DON’T CARE; 
ANY CHANGE 
PERMITTED 


DOES NOT 
APPLY 


OUTPUT 


WAVEFORMS 


OUTPUTS 


WILL BE 
STEADY 


WILL BE 
CHANGING 
FROM H TOL 


WILL BE 
CHANGING 
FROM L TOH 


CHANGING; 
STATE 
UNKNOWN 


CENTER 

LINE IS HIGH 
IMPEDANCE 
“OFF” STATE 


KS000010 








IC000870 


Co © 5.0 pF, all outputs 
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Am29C325 


CMOS 32-Bit Floating-Point Processor 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


@ Single VLSI device performs high-speed floating-point 
arithmetic i 
- Floating-point addition, subtraction, and multiplication 
in a single clock cycle 
- Internal architecture supports sum-of-products, 
~ Newton-Raphson division 
@ 32-bit, three-bus flow-through architecture 
~— Programmable |/O allows interface to 32- and 16-bit 
systems 


@ jEEE and DEC formats 


~ Performs conversions between formats 

~ Performs integer <> _ floating-point conversions 
Input and output registers can be made transparent 
independently 

Pin and functionally compatible with the Bipolar 
Am29325 

The Am29C325 uses less than one-quarter the power of 
the Am29325 

145 PGA requires no heatsink 


GENERAL DESCRIPTION 


The Am29C325 is a high-speed floating-point processor 
unit. It performs 32-bit single-precision floating-point addi- 
tion, subtraction, and multiplication operations in a single 
_ VLSI circuit, using the format specified by the proposed 
IEEE floating-point standard, 754. The DEC single-preci- 
sion floating-point format is also supported. Operations for 
conversion between 32-bit integer format and floating-point 
format are available, as are operations for converting 
between the IEEE and DEC floating-point formats. Any 
operation can be performed in a single clock cycle. Six 
flags — invalid operation, inexact result, zero, not-a-num- 
ber, overflow, and underflow — monitor the status of opera- 
tions. 


The Am29C325 has a three-bus, 32-bit architecture, with 
two input buses and one output bus. This configuration 


provides high I/O bandwidth, allows access to all buses, 
and affords a high degree of flexibility when connecting this 
device in a system. All buses are registered, with each 
register having a clock enable. Input and output registers 
may be made transparent independently. Two other I/O 
configurations, a 32-bit, two-bus architecture and a 16-bit, 
three-bus architecture, are user-selectable, easing inter- 
face with a wide variety of systems. Thirty-two-bit internal 
feedforward datapaths support accumulation operations, 
including sum-of-products and Newton-Raphson division. 


Fabricated using Advanced Micro Devices' 1.2 micron 
CMOS process, the Am29C325 is powered by a single 5- 
volt supply. The device is housed in a 145-lead pin-grid- 
array package. 


Am29C300 FAMILY HIGH-PERFORMANCE SYSTEM BLOCK DIAGRAM 


Am29C331 
16-BiT 
SEQUENCER 


MICROPROGRAM 
MEMORY 


Am29C332 


PIPELINE 32-BIT 
REGISTER ALU 


CONTROL 
SIGNALS 


This document contains information on a product under development at Advanced Micro 
Devices, Inc. The information is intended to help you to evaluate this product. AMD 
reserves the right to change or discontinue work on this product without notice. 
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Am29C327 


CMOS Double-Precision Floating-Point Processor 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


High-performance double-precision floating-point pro- e 
cessor 

Comprehensive floating-point and integer instruction e 
sets 

Single VLSI device performs single-, double-, and 
mixed-precision operations 

Performs conversions between precisions and between 

data formats 

Compatible with industry-standard floating-point formats 

— IEEE 754 format 

- DEC F, DEC D, and DEC G formats 

— IBM system/370 format 


Exact IEEE compliance for denormalized numbers with 
no speed penalty 

Eight-deep register file for intermediate results and on- 
chip 64-bit data path facilitates compound operations; 
e.g., Newton-Raphson division, sum-of-products, and 
transcendentals 

Supports pipelined or flow-through operation 
Fabricated with Advanced Micro Devices' 1.2 micron 
CMOS process 


SIMPLIFIED SYSTEM DIAGRAM 


32 


Constants 


ttt 


ALU input Multiplexer 


32 


64 


Floating-Point & Integer 


ALU 


64 


> F-Register 


64 


Output Multiplexer 


DEC F, DEC D, DEC G, and VAX are trademarks of the Digital Equipment Corporation. 
IBM system/370 is a trademark of international Business Machines, Inc. 
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Am29331 


16-Bit Microprogram Sequencer 


DISTINCTIVE CHARACTERISTICS 


@ 16-Bits Address up to 64K Words 
Supports 80-90 ns microcycle time for a 32-bit high- 
performance system when used with the other 
members of the Am29300 Family. 
Real-Time Interrupt Support 
Micro-trap and interrupts are handled transparently 
at any microinstruction boundary. 
Built-In Conditional Test Logic 
Has twelve external test inputs, four of which are 
used to internally generate four additional test con- 
ditions. 


@ Break-Point Logic 
Built-in address comparator allows break-points in 
the microcode for debugging and statistics collection. 
Master/Slave Error Checking 
Two sequencers can operate in parallel as a master 
and a slave, The slave generates a fault flag for 
unequal results. 
33-Level Stack 
Provides support for interrupts, loops, and subrou- 
tine nesting. It can be accessed through the D-bus 
to support diagnostics. 
Speed improvement with Am29331A (15% faster 
than Am29331) 


GENERAL DESCRIPTION 


The Am29331 is a 16-bit wide, high-speed single-chip 
sequencer designed to control the execution sequence of 
microinstructions stored in the microprogram memory. The 
instruction set is designed to resemble high-level language 
constructs, thereby bringing high-level language Program- 
ming to the micro level. 


The Am29331 is interruptible at any microinstruction 
boundary to support real-time interrupts. Interrupts are 
handled transparently to the microprogrammer as an unex- 
pected procedure call. Traps are also handled transparent- 
ly at any microinstruction boundary. This feature allows re- 
execution of the prior microinstruction. Two separate buses 
are provided to bring a branch address directly into the chip 
from two sources to avoid slow turn-on and turn-off times 


for different sources connected to the data-input bus. Four 
sets of multiway inputs are also provided to avoid slow turn- 
on and turn-off times for different branch-address sources. 
This feature allows implementation of table look-up or use 
of external conditions as part of a branch address. The 33- 
deep stack provides the ability to support interrupts, loops, 
and subroutine nesting. The stack can be read through the 
D-bus to support diagnostics or to implement multitasking 
at the micro-architecture level. The master/slave mode 
provides a complete function check capability for the 
device. 


The Am29331 is designed with the IMOX!M process which 
allows internal ECL circuits with TTL-compatible I/O. It is 
housed in a 120-lead pin-grid-array package. 


SIMPLIFIED BLOCK DIAGRAM 


NTR LS REAL TIME 
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IMOX is a trademark of Advanced Micro Devices, Inc. 
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RELATED AMD PRODUCTS 


Part No. 
Am29C323 CMOS 32-Bit Parallel Multiplier 


Am29325 32-Bit Floating-Point Processor 


Am29C325 CMOS 32-Bit Floating-Point Processor 
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® 


Y 


BD006102 


| Figure 1. Am29331 Detailed Block Diagram 





CONNECTION DIAGRAM 
(Bottom View) 


PGA* 


GNDE 3,2 


*Pinout observed from pin side of package. 
Key: VCCE = Vcc, ECL 

VCCT = Voc, TTL 

GNDE= GND, ECL 

GNDT= GND, TTL 


VCCE 


GNDT ST 


EQUAL OED 


A-FULL ERROR 


INTEN 


INTA 





SLAVE D15 
HOLD A15 


Y15 VCCT 


CD010382 











PIN DESIGNATIONS 
(Sorted by Pin No.) 


C-5 H-2 | 
C-6 | H-3 M-6 
C-7 H-11 M-7 
C-8 ) | H12 | : M-8 
H-13 M-9 
Jet : M-10 
J-2 EQUAI M-11 
J-3 M-12 
Jet M-13 
J12 , N-1 
J-13 N-2 
RST N-3 


N-5 
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PIN DESIGNATIONS 
(Sorted by Pin Name) 
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*Single +5-Volt supply. 
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LOGIC SYMBOL METALLIZATION AND PAD LAYOUT 


Moo3 M103 Meo3 Ma03 Do-Dis 








Die Size: 260 x 245 mil 


L$002352 Equivalent Gate Count: 2500 


ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 
formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29331 G ie} 


. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 


. TEMPERATURE RANGE 
C = Commercial (0 to + 85°C) 


. PACKAGE TYPE 
G = 120-Lead Pin Grid Array with Heatsink 
(CG 120) 


. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29331/Am29331A 
16-Bit Microprogram Sequencer 


Valid Combinations 


Valid Combinations list configurations planned to be 


| Valid Combinations supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
GC, GCB combinations, to check on newly released valid combinations, 


and to obtain additional data on AMD's standard military 
grade products. 
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PIN DESCRIPTION 


Ag-Ai5 Alternate Data (Input) 

Input to address multiplexer and counter. 

A-FULL Almost Full (Bidirectional, Three-State) 
Indicates that 28 < SP <63 (meaning there are five or less 
empty locations left on stack). Also active during stack- 
under flow. 

Ci, Carry In (Input, Active LOW) 

Carry-in to the incrementer. 

CP Clock Pulse (Input) 

Clocks sequencer at the LOW-to-HIGH transition. 

Do-Di5 Data (Bidirectional, Three-State) 

Input to address multiplexer, counter, stack, and comparator 
register. Output for stack and stack pointer. 

EQUAL Equal (Bidirectional, Three-State) 

Indicates that the address comparator is enabled and has 
found a match. 

ERROR ~ Error (Output, Active HIGH) 

Indicates a master/slave error in the slave mode. Indicates 
a malfunctioning driver or contention of any output in the 
master mode. 

FC Force Continue (input, Active HIGH) 

Overrides instruction with CONTINUE. 

HOLD _— Hold (Input, Active HIGH) 

Stops the sequencer and three-states the outputs. 
lo-'l5 Instruction (Input) 
Selects one of 64 instructions. 


FUNCTIONAL DESCRIPTION 


Architecture 


The major blocks of the sequencer are the address multiplex- 
er, the address register (AR), the stack (with the top of stack 
denoted TOS), the counter (C), the test multiplexer with logic, 
and the address comparison register (R) (Figure 1). The 
bidirectional D-bus provides branch addresses and iteration 
counts; it also allows access to the stack from the outside. 
The A-bus may be used for map addresses. There are four 
sets of 4-bit multiway branch inputs (M). The bidirectional 
Y-bus either ouputs microprogram addresses or inputs inter- 
rupt addresses. The buses are all 16 bits wide. Figure 1 shows 
a detailed block diagram of the sequencer. 


INTA Interrupt Acknowledge (Bidirectional, Three- 
State, Active LOW) 
Indicates that an interrupt is accepted. 


INTEN = _ Interrupt Enable (input, Active HIGH) 
Enables interrupts. 

INTR Interrupt Request (Input, Active HIGH) 
Requests the sequencer to interrupt execution. 


Mo-3, 0-3 Multiway (Input) | 
Four sets of multiway inputs providing 16-way branches. 
The first index refers to the set number. 


OEp Output Enable — D-Bus (Input, Active HIGH) 
Enables the D-bus driver, provided that the sequencer is not 
in the hold or slave mode. 


RST Reset (Input, Active LOW) 
Resets the sequencer. 


So-S3 Select (Input) 
Selects one of 16 test conditions. 


SLAVE Slave (input, Active HIGH) 
Makes the sequencer a slave. 


To-T1i1 Test (Input) 
Provides external test inputs. 


Yo-Y15 Address (Bidirectional, Three-State) 
Output of microcode address. Input for interrupt address. 


Address Multiplexer 


The address multiplexer can select an address from any of 
five sources: 


1) A branch address supplied by the D-bus 

2) A branch address supplied by the A-bus 

3) A multiway-branch address 

4) A return or loop address from the top of stack 
5) The next sequential address from the incrementer 
Multiway-Branch Address 


A multiway-branch address is formed by substituting the lower 
four bits of the address on the D-bus (D3, Dz, D1, Do) with one 
of the four sets (Mox, Mix, Mex, or M3x) of 4-bit multiway- 
branch addresses. The multiway-branch set is selected by the 
number D;Do, while the bits Dg and Do are ''don't cares." 




















Dis Do 
Branch 
Address 
| Mz Mo 
Multiway Inputs [| 
i ‘ 
Y4 5 ae Yo ‘ 
Address 
Out 





Table 4 (M3) 
Table 3 (Mp) 

















Base Address 





Table 2 (M, y) 
Table 1 (Mox) ° 
2 
_ 4} 45 
Lookup Table 
BD007460 


Notes: 1. Dy and Do select one out of four multiway sets. Dg and Do are "don't cares." 
2. Each set of M3x-—Mox can select one of sixteen locations. The multiway-branch address is the 
concatenation of Dj5-—D4 (base address) and Mx3- Mxo. 
3. For a given base address, there can be four look-up tables, each sixteen deep. 


Figure 2. Multiway Branch 
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Address Register 


The address register contains the current address. It is loaded 
from the interrupt multiplexer and feeds the incrementer. The 
incrementer is inhibited if Cy is taken HIGH. 


Stack 


A 33-word-deep and 16-bit-wide stack provides first-in last-out 
storage for return addresses, loop addresses, and counter 
values. Items to be pushed come from the incrementer, the 
interrupt-return-address register, the counter, or the D-bus. 
Items popped go to the address multiplexer, the counter, or 
the D-bus. 


The access to the stack via the D-bus may be used for context 
switching, stack extension, or diagnostics. As the stack is only 
accessible from the top, stack extension is done by temporari- 
ly storing the whole or some lower part of the stack outside the 
sequencer. The save and the later restore are done with pop 
and push operations, respectively, at balanced points in the 
microprogram; for example, points with the same stack depth. 
The internal D-bus driver must be turned on when popping an 
item to the D-bus; if the driver is off, the item will be unstacked 
instead. The driver is normally turned on when the Output 
Enable signal is asserted and the sequencer is not being reset 
(OEp = 1, RST = 1). 


The stack pointer is a modulo 64 counter, which is increment- 
ed on each push and decremented on each pop. The stack 
pointer is reset to zero when the sequencer is reset, but the 
pointer may also be reset by instruction. Thus, the stack 
pointer indicates the number of items on the stack as long as 
stack overflow or underflow has not occurred. Overflow 
happens when an item is pushed onto a full stack, whereby 
the item at the bottom of the stack is overwritten. Underflow 
happens when an item is popped from an empty stack; in this 
case the item is undefined. 


The contents of the stack pointer are present on the D-bus for 
all instructions except POP D, provided the driver is turned on. 
The output signal, A-FULL, is active under the following 
conditions: 28 <SP <63. 


Counter 


The counter may be used as a loop counter. It may be loaded 
from the D-bus, the A-bus, or via a pop from the stack. Its 
contents may also be pushed onto the stack. 


A normal for-loop is set up by a FOR instruction, which loads 
the counter from the D- or A-bus with the desired number of 
iterations; the instruction also pushes onto the stack a loop 
address that points to the next sequential instruction. The end 
of the loop is given by an unconditional END FOR instruction, 
which tests the counter value against the value one and then 
decrements the counter. If the values differ, the loop is 
repeated by selecting the address at the stack as the next 
address. If the values are equal, the loop is terminated by 


popping the stack, thereby removing the loop address, and - 


selecting the address from the incrementer as the next 
address. The number of iterations is a 16-bit unsigned number, 
except that the number zero corresponds to 65,536 iterations. 
By pushing and popping counter values it is possible to handle 
nested loops. 


Address Comparison 


The sequencer is able to compare the address from the 
interrupt multiplexer with the contents of the comparator 
register. The instruction SET loads the comparator register 
with the address on the D-bus and enables the comparison, 
while CLEAR disables it. The comparison is disabled at reset. 
A HIGH is present at the output EQUAL if the comparison is 
enabled and the two addresses are equal. The comparison is 
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useful for detection of a break point or counting the number of 
times a microinstruction at a specific address is executed. 


Instruction Set 


The sequencer has 64 instructions that are divided into four 
classes of 16 instructions each. The instruction lines Ig —-I5 
use I5 and lq to select a class, and Io-Ig3 to select an 
instruction within a class. The classes are: 


Is 14 Classes 

0 0 Conditional sequence control, 

0 1 Conditional sequence control with inverted 
polarity, 

1 0 Unconditional sequence control, and 

1 1 Special function with implicit continue. 


Note that for the first three classes |5 forces the condition to 
be true and Iq inverts the condition. The basic instructions of 
the first three classes are shown in Table 1 and the instruc- 
tions of the fourth class in Table 2. 


Structured microprogramming is supported by sequencer 
instructions that singly or in pairs correspond to high-level 
language control constructs. Examples are FOR |: = D DOWN 
TO 1DO...END FOR and CASE N OF ... END CASE. The 
instructions have been given high-level language names 
where appropriate. Figure 3 shows how to microprogram 
important contro] constructs; the high-level language is on the 
left and the microcode on the right. 


Test Conditions 


The condition for a conditional instruction is supplied by a test 
multiplexer, which selects one out of sixteen tests with the 
select lines So -S3. Twelve of these are supplied directly by 
the inputs To — T14, while the remaining four tests are generat- 
ed by the test logic from the inputs Tg -— 1744. The following 
table shows the assignments. 


(So-S3)H Test intended Use 


0-7 To -T7 General 

8 Tg C (Carry) 

9 Tg N (Negative) 

A T10 V (Overflow) 

B T4114 Z (Zero or equal) 

C Tg + 141 C+2Z (Unsigned less 
than or equal, borrow 
mode) 

D Tg + 714 C + Z (Unsigned less 
than or equal) 

E Tg ®T109 N@V (Signed less than) 

F (To ®@T10) + T11. (N®V)+Z (Signed less 


than or equal) 


Force Continue 


The sequencer has a force continue (FC) input, which over- 
rides the instruction inputs Ig —-15 with a CONTINUE instruc- 
tion. This makes it possible to share the microinstruction field 
for the sequencer instruction with some other control or to 
initialize a writable control store. 


Reset 


In order to start a microprogram properly, the sequencer must 
be reset. The reset works like an instruction overriding both 
the instruction input and the force continue input. The reset 


selects the address 0 at the address multiplexer, forces the 


EQUAL output to LOW, and disregards a potential interrupt 
request. It synchronously disables the address comparison 
and initializes the stack pointer to 0. The contents of the stack 
are invalid after a reset. 











TABLE 1. INSTRUCTION SET for Isla = 00, 01, 10 


Cond: Fail ae Pass 
Stack Stack 


Goto D 
Call D 
Exit D 
End for D, 
End for D, 
Goto A 

| Call A 
Exit A 
End for A, 
End for A, 
Goto M 
Call M 
Exit M 
End for M, 
End for M, 
End Loop 
Cail Coroutine 


Return 
End for, C#1 
End for, C = 1 


Cond. = (Test [S] OR I5) XOR 14 
: = Concatination 
C = Counter 





Push INC 
Pop 


Push INC 
Pop 


>>rreugg4»|< 


Push INC 
Pop 


Pop & 
Push INC 
Pop 


| Pop 


INC = Output of Incrementer = AR +1 (if Cj, = LOW) 


Note: For unconditional instructions, the action marked under Cond.:Pass is taken. 


TABLE 2. INSTRUCTION SET for Isl4 = 11 


Continue 
For D 
Decrement 
Loop 
Pop D 
Push D 
Reset SP 
For A 
Pop C 
Push C 
Swap 


Push C Load D | 


Load D 
Load A 
Set 
Clear 


R = Comp. Register 





Push INC 


Push INC 
Pop 
Push D 
SP<-0 _ 
Push INC 
Pop 
Push C 
TOS<C 
Push C 


R<D, Enable 
Disable 
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Interrupts 


The sequencer may be interrupted at the completion of the 
current microcycle by asserting the interrupt request input 
INTR. The return address of the interrupted routine is saved 
on the stack so that nested interrupts can be easily imple- 
mented. An interrupt is accepted if interrupts are enabled and 
the sequencer is not being reset or held (INTEN = HIGH, 
RST = HIGH, and HOLD = LOW). The interrupt-acknowledge 
output (INTA) goes LOW when an interrupt is accepted. 


When there is no interrupt, addresses go from the address 
multiplexer to the Y-bus via the driver, and to the address 
register and the comparator via the interrupt multiplexer. When 
there is an interrupt, the driver of the sequencer is turned off, 
an external driver is turned on, and the interrupt multiplexer is 
switched. The interrupt address is supplied via the external 
driver to the Y-bus, the address register, and the comparator 
(Figure 4). In order to save the address from the address 
multiplexer, the address is stored in the interrupt return 
address register, which for simplicity is clocked every cycle. 
The next microinstruction is the first microinstruction of the 
interrupt routine (Figure 5). 


In this cycle the address in the interrupt return address register 
is automatically pushed onto the stack. Therefore the microin- 
struction in this cycle must not use the stack; if a stack 
operation is programmed, the result is undefined. The instruc- 
tions that do not use the stack are GOTO D, GOTO A, GOTO 
M, CONTINUE, DECREMENT, LOAD D, LOAD A, SET and 
CLEAR. A RETURN instruction terminates the interrupt routine 
and the interrupted routine is resumed. Interrupts only work 
with a single-level control path. 


Traps 


A trap is an unexpected situation linked to current microin- 
struction that must be handled before the microinstruction 
completes and changes the state of the system. An example 
of such a situation is an attempt to read a word from memory 
across a word boundary in a single cycle. When a trap occurs, 
the current microinstruction must be aborted and re-executed 
after the execution of a trap routine, which in the meantime will 
take corrective measures. An interrrupt, on the other hand, is 
- not linked directly to the current microinstruction that can 
complete safely before an interrupt routine is executed. 


Execution of a trap requires that the sequencer ignore the 
current microinstruction, select the trap return address at the 
address multiplexer, and initiate an interrupt. This will save the 
trap return address on the stack and issue the trap address 
from an external source (Figure 6). The address register 
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contains the address of the microinstruction in the pipeline 
register, thus the address register already contains the trap 
return address when a trap occurs. This address can be 
selected by the address multiplexer by disabling the incremen- 
ter (Ciy = 1), and using the force continue mode (FC = 1). In 
this mode the sequencer ignores the current microinstruction. 
The remaining part of the trap handling is done by the interrupt 
(Figure 7), thus the section on interrupts also applies to traps. 
There is one exception, however. The interrupt enable cannot 
be used as a trap enable as it does not control the force 
continue mode and the carry-in to the incrementer. 


Hold Mode 


The sequencer has a hold mode in which the operation is 
suspended. 


When the HOLD signal goes active, the outputs (Y, INTA, 
A-FULL & EQUAL) are disabled and the sequencer enters the 
hold mode after the current cycle. While the sequencer is in 
this mode, the internal state is left unchanged and the D-bus is 
disabled. When the HOLD signal goes inactive, the outputs (Y, 
INTA, A-FULL & EQUAL) are enabled again and the sequencer 
leaves the hold mode after the cycle. 


In a time-multiplexed multimicroprocess system there may be 
one sequencer for all processes with microprogrammed con- 
text save and restore, or there may be one sequencer per 
microprocess permitting fast process switch. In the latter case 
the Y-buses of the sequencers are tied together and connect- 
ed to a single microprogram store. A control unit decides on a 
cycle-by-cycle basis what sequencer should be running, and 
activates the HOLD signal to the remaining sequencers. The 
hold mode has higher priority than interrupts, and works 
independently of the reset. The hold mode can only be used 
with a single-level control path. 


Master/Slave Configuration 


In some systems reliability is very important. The master/slave 
configuration that consists of two sequencers operated in 
parallel is able to detect faults in both the interconnect and the 
internal function of the sequencers. One sequencer is the 
master and operates normaily. The other is the slave, i.e., all 
outputs except the signal ERROR are turned into inputs and 
connected to the outputs of the master. Since the slave is 
operated in parallel with the master, it can compare its result 
with the result of the master and signal an error if they differ. 
The error signal from the master indicates a malfunctioning 
driver or contention. Because a TTL output goes HIGH when 
power is missing, the ERROR signal also indicates power 
failure. 











_ High-Level Language Constructs 


An example of high-level language constructs using Am29331 instructions is given in Figure 3 (3-1, 3-2, 3-3, and 3-4). 


REPEAT - LOOP FOR CNT:=10 DOWN TO 1 DO FOR D 10 

UNTIL CC END LOOP NOT CC END FOR END FOR 

WHILE CC DO LOOP Figure 3-2. Loop with Known Number of 
IF NOT CC THEN EXIT L Iterations 

END WHILE END LOOP 

LOOP LOOP 


IF CC THEN EXIT IF CC THEN EXIT L 


END LOOP ND LOOP 
L: 


Figure 3-1. Loops with Unknown Number 
of iterations 


PUSH D B PUSH D C 
CASE |OF GOTO M IF X THEN IF NOT X THEN GOTO A 
0: - A: - IF Y THEN IF NOT Y THEN GOTO B 
~, RETURN (TO B) _ - 
-, RETURN (TO C) 
B: 


At2: - ~ 
-, RETURN (TO B) ELSE 
At+4: - - - | 
-, RETURN (TO B) - _-, RETURN (TO C) 
At+6: - END IF 
- -, RETURN ELSE A: 
END CASE BB: IF Z THEN IF NOT Z THEN GOTO D 


- ~-, RETURN (TO D) 
Figure 3-3. Case Statement ELSE D: 


(with D= Ais .. . A4gXX00 and = = 
Mo, 0-3 =Aglilo0 during the = -, RETURN (TO C) 
GOTO M instruction. AjAp must END IF 

be 00, and X signifies a don't END IF C: 

care.) , 


Figure 3-4. Double-Nested If Statement 





While executing the inet. at A, the seq is 
interrupted and directed to B. 


Executing at B. E 
Executing at A. ; 








A: Continue 
Ael: ... 


8 : Continue 


B+: ... pee 


[ = 


AF004192 re 
AF004212 


Figure 4. Am29331 Interrupt Cycle 1 Figure 5. Am29331 Interrupt Cycle 2 


A trap occurs at the inst. A, and the seq. is 
directed to B. 


Executing at A. 





: Instruction Trapped By FC = 1. 
Cin = 1. INTR = 1 | 
A+t: ... 


Wome Ge 


AF004182 
Figure 6. Am29331 Traps Cycle 1 , Figure 7. Am29331 Traps Cycle 2 } 





3-13 





Instruction Set Definition 


Legend: @ = Other instruction P= Test pass 
© = Instruction being described F = Test fail 
CC = (Test [S3 -So]) O = Register in part 


Mnemonics Description Execution Example 


BRA__D- GOTO D 
Unconditional branch to the address specified 
by the D inputs. The D port must be disabled to 
avoid bus contention. 


GOTO A 
Unconditional branch to the address specified 
by the A inputs. 


GOTO Multiway (Dy5-D4 Mxg3 - Mxo) 
Unconditional branch to the address specified 
by the M inputs concatenated with the D input. 
The lower four bits on the D bus (D3 - Do) are 
replaced by one of the four sets of the four-bit 
multiway branch addresses. The multiway 
branch set is selected by bits Dy and Do while 
bits D3 and Do are ''don’t cares." 


GOTO TOS ba 
Unconditional branch to the address on the top PF001730 
of the stack. 


IF CC THEN GOTO D 

ELSE CONTINUE 

lf CC is HIGH (pass), branch to the address 
specified by D. If CC is LOW (fail), continue. 
The D port must be disabled to avoid bus 
contention. 


IF CC THEN GOTO A 

ELSE CONTINUE 

lf CC is HIGH (pass), branch to the address 
specified by A. If CC is LOW (fail), continue. 


IF CC THEN GOTO Muitiway 

(D415 - D4 Mxg3 - Mxo) 

ELSE CONTINUE 

If CC is HIGH (pass), branch to the address 
specified by D inputs concatenated with the M 
inputs. If CC is LOW (fail) continue. The lower 
four bits on the D bus (D3 —- Do) are replaced by 
one of the four sets of the 4-bit multiway 
branch addresses. The multiway branch set is 
selected by bits D1 and Do while bits D3 and Do 
are ''don't cares." 


PF001740 
IF CC THEN GOTO TOS 
ELSE 
POP STACK 
CONTINUE 
If CC is HIGH (pass), branch to the address on 
the top of the stack. If CC is LOW (fail), pop the 
stack and continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Mnemonics Description Execution Example 5 


BRNC__D IF NOT CC THEN GOTO D I 
ELSE CONTINUE : 
lf CC is LOW (pass), branch to the address [it 
specified by D. If CC is HIGH (fail), continue. Ee 
The D Port must be disabled to avoid Bus 
contention. 


IF NOT CC THEN GOTO A 

ELSE CONTINUE 

lf CC is LOW (pass), branch to the address e 
specified by A. If CC is HIGH (fail), continue. i: 


IF NOT CC THEN GOTO Multiway 
(Di5 - D4 Mxg - Mxo) 

ELSE CONTINUE 

lf CC is LOW (pass), branch to the address 
specified by D inputs concatenated with the M 
inputs. !f CC is HIGH (fail), continue. The lower 
four bits on the D bus (D3 - Do) are replaced by 
one of the four sets of the 4-bit multiway % 
branch addresses. The multiway branch set is he 
selected by bits Dy and Do while bits D3 and Da 
are ‘don't cares." 








PF001750 
IF NOT CC THEN GOTO TOS 
ELSE 
POP STACK 
CONTINUE 
If CC is LOW (pass), branch to the address on 
the top of the stack. If CC is HIGH (fail), pop the 
stack and continue. 


CALL D 
Unconditional branch to the subroutine 
specified by the D inputs. Push the return 
address (address Reg. + 1) on the stack. The 
D port must be disabled to avoid bus 
contention. 


CALL A 

Unconditional branch to the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. 


CALL Multiway (Dy5-D4 Mxg3 - Mxo) 

Unconditional branch to the subroutine 
specified by the D inputs concatenated with the 
multiway inputs. Push the return address 
(Address Reg. +1) on the stack. The lower 
four bits on the D bus (D3 — Do) are replaced by 
one of the four sets of the 4-bit multiway 
branch addresses. The multiway branch set is 
selected by bits D; and Do while bits D3 and Do 
are ''don't cares." | 


PF001760 





CALL TOS 

Unconditional branch to the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. +1) is then pushed 
onto the stack. 





Note: Opcode numbers are in hexadecimal notation. 








Mnemonics Description Execution Example 


CcCC__D IF CC, THEN CALL D 
ELSE CONTINUE 
If CC is HIGH (pass), call the subroutine 
specified by the D inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is LOW (fail), continue. The D port must be 
disabled to avoid bus contention. 


iF CC, THEN CALL A 

ELSE CONTINUE 

lf CC is HIGH (pass), call the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is LOW (fail), continue. 


iF CC, THEN CALL Multiway 

(D145 -D4 Mxg - Mxo) 

ELSE CONTINUE 

If CC is HIGH (pass), call the subroutine 
specified by the D inputs concatenated with the 
M inputs. Push the return address (Address 
Reg. + 1) on the stack. The lower four bits on 
the D bus (D3 - Do) are replaced by one of the 
four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits Dg and Do are 
"don't cares." 


PF001770 
1F CC, THEN CALL TOS 
ELSE CONTINUE 
lf CC is HIGH (pass), call the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. + 1) is pushed onto the 
stack. If CC is LOW (fail), continue. 


iF NOT CC, THEN CALL D 

ELSE CONTINUE 

If CC is LOW (pass), call the subroutine 
specified by the D inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is HIGH (fail), continue. The D port must be 
disabled to avoid bus contention. 


iF NOT CC, THEN CALL A 

ELSE CONTINUE 

lf CC is LOW (pass), call the subroutine 
specified by the A inputs. Push the return 
address (Address Reg. + 1) on the stack. If CC 
is HIGH (fail), continue. 


IF NOT CC, THEN CALL Multiway 
(Dy5-D4 Mx3 — Mxo) 

ELSE CONTINUE 

lf CC is LOW (pass), call the subroutine 
specified by the D inputs concatenated with the 
M inputs. Push the return address (Address 
Reg. + 1) on the stack. The lower four bits on 
the D bus (D3 —- Do) are replaced by one of the 
four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Dg while bits Dg and Do are 
"don't cares." 


PF001780 
IF NOT CC, THEN CALL TOS 
ELSE CONTINUE 
lf CC is LOW (pass), call the subroutine 
specified by the address on the top of the 
stack. The stack is popped and the return 
address (Address Reg. + 1) is pushed onto the 
stack. 


Note: Opcode numbers are in hexadecimal notation. 





Mnemonics Description Execution Example 


EXIT_D EXIT TO D i 
Unconditional branch to the address specified 
by the D inputs and pop the stack. The D port 
must be disabled to avoid bus contention. 


EXIT TO A . 
Unconditional branch to the address specified 
by the A inputs and pop the stack. 


EXIT TO Multiway (Dy5-—D4 Myxg3 - Mx) 
Unconditional branch to the address specified 
by the D inputs concatenated with the M inputs 
and pop the stack. The lower four bits on the D 
bus (D3 -Do) are replaced by one of the four 
sets of the 4-bit multiway branch addresses. 
The multiway branch set is selected by bits Dy . 
and Do while Dg and Do are ''don't cares." 








PF001790 
EXIT TO TOS 
Unconditional branch to the address on the top 
of the stack and pop the stack. Also used for iG 
unconditional returns. : 


IF CC, THEN EXIT TO D | 
ELSE CONTINUE 

If CC is HIGH (pass), exit to the address 
specified by the D inputs and pop the stack. If 
CC is LOW (fail), continue with no pop. The D 
port must be disabled to avoid bus contention. 


IF CC, THEN EXIT TO A 

ELSE CONTINUE 

lf CC is HIGH (pass), exit to the address 
specified by the A inputs and pop the stack. If 
CC is LOW (fail), continue with no pop. 


IF CC, THEN EXIT TO Multiway 
(D45-D4 Myxg3 - Mxo) 

ELSE CONTINUE 

If CC is HIGH (pass), exit to the address 
specified by the D inputs concatenated with the 
M inputs and pop the stack. The lower four bits 
on the D bus (D3.- Do) are replaced by one of 
the four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy; and Do while bits D3 and Do are 
“don't cares." 





PFO001800 
IF CC, THEN EXIT TO TOS 
ELSE CONTINUE 
if CC is HIGH (pass), exit to the address on the 
top of the stack and pop the stack. If CC is 
LOW (fail), continue with no pop. Also used for 
conditional returns. ; 





Note: Opcode numbers are in hexadecimal notation. 
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Opcode 
(I5 — Io) 


12y 


16H 


1AyH 


1EH 


23H 


2714 


2BH 


2Fy 


Mnemonics 





XTNC_D 


XTNC_A 


XTNC_M 


XTNC_S 


DJMP__O 


DJMP_A 


DJMP__M 


DJMP__S 


Description 





IF NOT CC, THEN EXIT TO D 

ELSE CONTINUE 

If CC is LOW (pass), exit to the address 
specified by the D inputs and pop the stack. If 
CC is HIGH (fail), continue with no pop. The D 
port must be disabled to avoid bus contention. 


IF NOT CC, THEN EXIT TO A 

ELSE CONTINUE 

lf CC is LOW (pass), exit to the address 
specified by the A inputs and pop the stack. If 
CC is HIGH (fail), continue with no pop. 


iF NOT CC, THEN EXIT TO Multiway 
(D45-D4 Mxg - Mxo) 

ELSE CONTINUE 

lf CC is LOW (pass), exit to the address 
specified by the D inputs concatenated with the 
M inputs and pop the stack. The lower four bits 
on the D bus (Dg —- Do) are replaced by one of 
the four sets of the 4-bit multiply branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits D3 and = are 
"don't cares." 


IF NOT CC, THEN EXIT TO TOS 
ELSE CONTINUE 

If CC is LOW (pass), exit to the address on the 
top of the stack and pop the stack. If CC is 
HIGH (fail), continue with no pop. Also used for 
conditional returns. 


IF CNT #1 THEN CNT: = CNT -1 
GOTO D 

ELSE CNT: = CNT -1 

CONTINUE 

lf the counter is not equal to one, decrement 
the counter and branch to the address 
specified by the D inputs. If the counter is equal 
to one, then decrement the counter and 
continue. The D port must be disabled to avoid 
bus contention. 


IF CNT #1 THEN CNT: = CNT -1 
GOTO A 

ELSE CNT: = ONT-1 

CONTINUE 


If the counter is not equal to one, decrement . 


the counter and branch to the address 
specified by the A inputs. If the counter is equal 
to one, then decrement the counter and 
continue. 


IF CNT #1 THEN CNT: = CNT -1 
GOTO Multiway (Di5 -D4 Mxg -— Mxo) 
ELSE CNT: = CNT -1 

CONTINUE 

If the counter is not equal to one, decrement 
the counter and branch to the address 
specified by the D inputs concatenated with the 
M inputs. The lower four bits on the D bus 
(D3 - Dg) are replaced by one of the four sets 
of the 4-bit multiway branch addresses. The 
multiway branch set is selected by bits Dy and 
Do while bits Dg and Doe are "don't cares." 


IF CNT #1 THEN CNT: = CNT -—1 
GOTO TOS 

ELSE CNT: = CNT -1 

POP STACK | 

CONTINUE 

If the counter is not saudien to one, decrement 
the counter and branch to the address on the 
top of the stack. If the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 


Note: Opcode numbers are in hexadecimal notation. 


Execution Example 






STACK yf 


54 COUNTER = 1 


7 





STACK 


Dw Poe 


7 51 


PF001810 


COUNTER 


--O-— CouNT-1 


PF001820 


Opcode 





(I5 — Ig) Mnemonics Description Execution Example 
03} DJCC_D IF CC AND CNT #1 THEN CNT: = CNT-1 
GOTO D 
ELSE CNT: =CNT-1 
CONTINUE 


lf CC is HIGH (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D 
it inputs. If CC is LOW (fail) or the counter is 
‘ equal to one, then decrement the counter and 
continue. The D port must be disabled to avoid 

bus contention. 


07H DUCC_A IF CC AND CNT #1 THEN CNT: = CNT-1 } Pano | counter ' 


GOTO A COUNTER ¥ 1 

ELSE CNT: =CNT-1 ~-Q-—— count-1 
CONTINUE 
if CC is HIGH (pass) and the counter is not 54 FOR 
equal to one, decrement the counter and COUNTER = 1 
branch to the address specified by the A inputs. . 
lf CC is LOW (fail) or the counter is equal to 4 
one, then decrement the counter and continue. 








PF001830 
OBH DJCC__M IF CC AND CNT #1 THEN CNT: = CNT -1 I: 

GOTO Multiway (Dy5-D4 Mx3 - Mxo) . 

ELSE CNT: = CNT -1 

CONTINUE 

lf CC is HIGH (pass) and the counter is not 

equal to one, decrement the counter and 

branch to the address specified by the D inputs 

concatenated with the M inputs. The lower four 

bits on the D bus (D3 —- Do) are replaced by one 

of the four sets of the 4-bit multiway branch 

addresses. The multiway branch set is selected 

by bits Dy and Do while bits Dg and Do are 

"don't cares." 


OFH DJCC_S IF CC AND CNT #1 THEN CNT: = CNT -1 
GOTO TOS | 
ELSE CNT: = CNT -1 
POP STACK 
CONTINUE 
lf CC is HIGH (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address on the top of the stack. If | 
CC is LOW (fail) or the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 





Note: Opcode numbers are in hexadecimal notation. 
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Opcode 
(I5 - Io). 


13H 


17H 


1BH 


1Fy 


2Ey 


OEY 


1Ey 


Mnemonics 


DJNCC_D 


DJNCC__A 


DJNCC__M. 


DJNCC_S 


RET 


RETCC 


RETNC 


Description 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT -1 

GOTO D 

ELSE CNT: = CNT -1 

CONTINUE 

lf CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D 
inputs. If CC is HIGH (fail) or the counter is 
equal to one, then decrement the counter and 
continue. The D port must be disabled to avoid 
bus contention. 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT -1 

GOTO A 

ELSE CNT: = CNT -1 

CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 


' branch to the address specified by the A inputs. 


The content of the interrupt return address 
register and the address register is replaced by 
the A address in this case. If CC is HIGH (fail) 
or the counter is equal to one, the current 
address is incremented, appears on the bus for 
continue, and is stored into the above two 
registers. 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT - 1 

GOTO Multiway (Dy5-D4 M3 - Mo) 
ELSE CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address specified by the D inputs 
concatenated with the M inputs. The lower four 
bits on the D bus (Dg ~ Do) are replaced by one 
of the four sets of the 4-bit multiway branch 
addresses. The multiway branch set is selected 
by bits Dy and Do while bits D3 and Do are 
“don't cares." 


IF NOT CC AND CNT #1 THEN 
CNT: = CNT — 1 

GOTO TOS 

ELSE CNT: = CNT -1 

POP STACK 

CONTINUE 

If CC is LOW (pass) and the counter is not 
equal to one, decrement the counter and 
branch to the address on the top of the stack. If 
CC is HIGH (fail) or the counter is equal to one, 
then decrement the counter, pop the stack and 
continue. 


RETURN 
Unconditional return from subroutine. The 
return address is popped from the stack. 


IF CC THEN RETURN 

ELSE CONTINUE 

If CC is HIGH (pass), return from subroutine. 
The return address is popped from the stack. If 
CC is LOW (fail), continue. 


IF NOT CC THEN RETURN 

ELSE CONTINUE 

If CC is LOW (pass), return from subroutine. 
The return address is popped from the stack. If 
CC is HIGH (fail), continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example — 


P AND 
COUNTER # 1 





COUNTER 


--O=— count-1 


COUNTER = 1 


§1 


52 


PF001840 


STACK 


7 
¢ 


90 


O— PC +1 
91 
92 
93 


PF001850 


Opcode 
(I5 - Io) 


31H 


374 


33H 


341 


38H 


35H 


39}4 
3AH 


Mnemonics 


FOR_D 


FOR_A 


LOOP 


POP_D 


POP__C 


PUSH_D 


PUSH__C 
SWAP 


Description 


INITIALIZE LOOP 

Push the Address Reg. + 1 on the stack, load 
the counter from the D inputs and continue. 
Use with DJUMP__S for FOR... NEXT loops. 
The D port must be disabled to avoid bus 
contention. 


INITIALIZE LOOP 

Push the Address Reg. + 1 on the stack, load 
the counter from the A inputs and continue. 
Use with DJUMP__S for FOR... NEXT loops. 


INITIALIZE LOOP 

Push the Address Reg. +.1 on the stack and 
continue. Use with BRCC_S for 
REPEAT ...UNTIL loops, or with XTCC_D 
and BRA__S for WHILE... END WHILE loops. 


Pop the stack and output the value on the D 
outputs and continue. The D port must be 
enabled. 


Pop the stack and store the value in the 
counter and continue. 


Push the D inputs on the stack and continue. 
The D port must be disabled to avoid bus 
contention. 


Push the counter on the stack and continue. 


Exchange the counter and the top of stack and 
continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


STACK 
50 Cm Pc +1 
7 
7 
COUNTER 
§2 
STACK 
50 OW PC + 1 
7 
4 
51 
§2 
PFO01860 
STACK 
50 fa) 
7 
7 
51 
§2 
STACK 
50 D 
7 
¢ 
51 
§2 
STACK 
50 
7 
7 
51(@~-—— 
COUNTER 
§2 
PF001870 




















‘Mnemonics Description 


STACK_C Push the counter on the stack and load the 
counter with the value of the D inputs and 
continue. 


LOAD_D Load the counter with the value of the D inputs 
and continue. The D port must be disabled to 
avoid bus contention. 


LOAD_A Load the counter with the value of the A inputs 
and continue. 


CONT , Continue. 
DECR Decrement the counter and continue. 
RESET__SP Reset the stack pointer and continue. 


Load the comparison register with the value of 
the D inputs, enable the comparator and 
continue. 


Disable the comparator and continue. 


Note: Opcode numbers are in hexadecimal notation. 
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Execution Example 


, 
¢ 
COUNTER 


COUNTER 


¢ 
¢ 


PF001880 


COUNTER 


PF001890 


COMPARE 


7 
7 


PF001900 





APPLICATIONS 


Address 


ef Test Am29331 CP 


interrupt 
Vector 


Microprogram 
Memory 


Pipeline Register CP pire 


Am29332 
ALU 


Reg. 
Status 


BD006220 


Figure 8. Typical Control-Path Architecture For Am29300 Family 


(Clock to Register Status Outputs of the Am29332) 


ALU Status e Am29331 
Register Output Test Inputs 


(Test Inputs to Y Outputs) 


Am29331 Outputs 


Microprogram 
Memory Outputs 


Register Setup Time 
WF021091 


Figure 9. Cycle Timing Waveform* 


* This waveform shows the timing relationship for the configuration shown in Figure 8. 
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Suggestions for Power and Ground Pin 
Connections | 


The Am29331 operates in an environment of fast signal rise 
times and substantial switching currents. Therefore, care must 
be exercised during circuit board design and layout, as with 
any high-performance component. The following is a sug- 
gested layout, but since systems vary widely in electrical 
configuration, an empirical evaluation of the intended layout is 
recommended. 


The Vcoct and GNDT pins, which carry output driver switching 
currents, tend to be electrically noisy. The Voce and GNDE 
pins, which supply the ECL core of the device, tend to produce 
less noise, and the circuits they supply may be adversely 
affected by noise spikes on the Vccg plane. For this reason, it 
is best to provide isolation between the Voce and Vccr pins, 
as well as independent decoupling for each. Isolating the 
GNDE and GNDT pins is not required. 


Printed. Circuit-Board Layout Suggestions | 


1. Use of a multi-layer PC board with separate power, ground, 
and signal planes is highly recommended. 


2. All VcceE and Vcoct pins should be connected to the Voc 
plane. VcctT pins should be isolated from Voce pins by means 
of a slot cut in the Vccge plane; see Figure 10. By physically 
separating the Vcce and Vccr pins, coupled noise will be 
reduced. . 


3. All GNDE and GNDT pins should be connected directly to 
the ground plane. 


4. The VccT pins should be decoupled to ground with a 0.1-uF 
ceramic capacitor and a 10-yF electrolytic capacitor, placed 
as Closely to the Am29331 as is practical. Voce pins should 
be decoupled to ground in a similar manner. 


A suggested layout is shown in Figure 10. 
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Figure 10. Suggested Printed Circuit-Board Layout 
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THERMAL RESISTANCE 9 al CM) 


QjA Still Air 








se 8jA 200LFM 
JA 600 LFM 
Heat Sink 
10 
0 200 400 600 


AIR VELOCITY (LINEAR FEET PER MINUTE) 


OP002612 


Figure 11. Am29331 Thermal Characteristics (Typical) 
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_ ABSOLUTE MAXIMUM RATINGS | _ OPERATING RANGES 


Storage Temperature .... -65 to +150°C . Commercial (C) Devices 7 
Temperature Under Bias - To -55 to +125°C Temperature (Tc) 0 to +85°C 
Supply Voltage to Ground Potential Supply Voltage (Vcc) | +4,75 to +5.25 V 
Continuous -0.5 to +7.0 V Air Velocity 200 linear feet per minute 
DC Voltage Applied to Outputs | | 
for High State -0.5 V to +Vcco Max . Operating ranges define those limits between which the 


DC Input Voltage -0.5 to +5.5 V functionality of the device is guaranteed. 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


DC CHARACTERISTICS over operating range 


Parameters Description bg ree a lene 
lIOH = -1.6 | lon = -1.6 mA for Yo- Y15, INTA | for Yo- | lon = -1.6 mA for Yo- Y15, INTA | 
Voc = Min. 
Nee Men Nonage Vin = Vit or Vin loH =-1.2 mA for All Others vents 
lo. = 16 mA for Yo-Y145, 
Voc = Min. | !oL. = 16 mA for Yo-Yi5, INTA_ | 
Even Output LOW Voltage Vin= Vit or Vir FigL=12 mA for All Others _—| yee 
Guaranteed Input Logical 
oem i coill eld eh HIGH Voltage for All Inputs | Vots | 
Guaranteed Input Logical 
ne Level “LOW Voltage for All Inputs ee 
Yo-Y15, Do - 045, INTA, 
A-FULL, EQUAL 















Ao-Ai5, Mo-3, 0-3 
lo- Is, Toot 
So - 














Voc = Max., 
Vin = 0.5 V 


Input LOW Current ~S3, FC, 





[stave HOD SSC~sCSC‘d CS 
ene Bee He 
TS iE 


Yo-Y15, Do-Dis, INTA 
A-FULL, EQUAL 





Ao-A1i5, Mo-3, 0-3: 
lo-\s5, To- 
So - S3, EC, 


stave, HOLD SSSC*dCSC~*dt~CSO 
A aaa A 
[300 | 


Voc = Max., 
Input HIGH Current Vin = 5.5 V oe 
cise Off State (High-Impedance) Vasa haw Vo=2.4 V | | 100 | 
loz. Output Current cc Vo =0.5 V rate 
Output Short Circuit Current Voc = Max. +0.5 V 45 Sma | 
(Note 2) Vout = +0.5 V 
% Power Supply Current V M COM'L Onl To =0 to +85°C |__| _1,300 A 
cc (Note 3) ala ny 3 m 
To = +85°C | ss 4,200 


. For conditions shown as Min. or Max., use the appropriate value specified under Operating Ranges for the applicable device type. 
2. Not more than one output should be shorted at a time. Duration of the short-circuit test should not exceed one second. 
. Measured with all inputs LOW and outputs disabled. 
. It is the responsibility of the user to ean: a case temperature of + 85°C or less. AMD recommends an air velocity of at least 200 linear 
feet per minute over the heatsink. 















Voc = Max., 
VIN = 2.4 V 





Input HIGH Current 
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SWITCHING CHARACTERISTICS over operating range (Note 1) 
A. COMBINATIONAL PROPAGATION DELAYS 
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Notes: See notes following Table C. 
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SWITCHING CHARACTERISTICS (Cont'd.) 


B. OUTPUT DISABLE TIME 


Description 


Reset-to-Address Enable 
Reset-to-Address Disable 
INTR-to-Address Enable 
INTR-to-Address Disable 
INTEN-to-Address Enable 
INTEN-to-Address Disable 
HOLD-to-Address Enable 
HOLD-to-Address Disable 
SLAVE-to-Address Enable 
SLAVE-to-Address Disable 
OED-to-Data Enable 
OED-to-Data Disable 
Reset-to-Data Enable 
Reset-to-Data Disable 
SLAVE-to-Data Enable 
SLAVE-to-Data Disable 
Clock-to-Data Enable 
Clock-to-Data Disable 
HOLD-to-INTA Enable 
HOLD-to-INTA Disable 
HOLD-to-A-FULL Enable 
HOLD-to-A-FULL Disable 
HOLD-to-EQUAL Enable 
HOLD-to-EQUAL Disable 
SLAVE-to-INTA Enable 
SLAVE-to-INTA Disable 
SLAVE-to-A-FULL Enable 
SLAVE-to-A-FULL Disable 
SLAVE-to-EQUAL Enable 
SLAVE-to-EQUAL Disable 


Notes: See notes following Table C. 
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SWITCHING CHARACTERISTICS (Cont'd.) 





Notes: 1. 


2. 


3 
4, 
5 


8 


Data Setup 

Data Hold 

Alternate Data Setup 
Alternate Data Hold 
Multiway Setup 
Multiway Hold 
Address Setup 
Address Hold 
Instruction Setup 
Instruction Hold 
Forced Continue Setup 
Forced Continue Hold 
Test Setup 

Test Hold 

Select Setup 

Select Hold 

Reset Setup 

Reset Hold 

Interrupt Request Setup 
Interrupt Request Hold 
Interrupt Enable Setup 
Interrupt Enable Hold 
Hold Mode Setup 
Hold Mode Hold 
Carry-In Setup 
Carry-In Hold 


C. SETUP AND HOLD TIMES 






. 
a 


meek 


—_h 


3 OP Ph 9 OP OP PO OP PE PE OP OP OP OP IP PP OP > 
—_ 


—h 


4 
8 
3 
8 
2 
5 
3 
1 
1 
1 
0 
6 
0 

16 
0 
5 
2 
8 
2 
8 
2 
5 
3 
0 
QO 


—> 


It is the responsibility of the user to maintain a case temperature of + 85°C or less. AMD recommends 
an air velocity of at least 200 linear feet per minute over the heatsink. | 

(INTR, INTEN)-to-EQUAL is the sum of (INTR, INTEN)-to-Y disable time and Y-to-EQUAL delay time. 
This is not tested due to bus turnaround in Master/Slave mode. 


. The status of I5-l9 and FC must not be changed during the Clock LOW time. 


CL = 50 pF; C, =5 pF for Disable Time only. 


. 2 = Three-state output path; use Table B. 
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SWITCHING TEST CIRCUIT 


Vec 





TC003420 


A. Three-State Outputs 


Notes: 1. C_ = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S1, Se, Sg are closed during function tests and all AC tests except output enable tests. 
3. Sy and S3 are closed while So is open for tpzy test. 
S$; and So are closed while S3 is open for tpz, test. 
4. C_= 5.0 pF for output disable tests. 
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SWITCHING TEST WAVEFORMS 


cara VWVVVVWV WV. 
aaa NW “ARE 
Fea t 


TIMING 
INPUT ——S--- Se ea er ey 


Ov 


WFRO2970 
Notes: 1. Diagram shown for HIGH data only. Output 
transition may be opposite sense. 
2. Cross hatched area is don't care condition. 


Setup, Hold, and Release Times 


——— 3 V 
SAME PHASE 
INPUT TRANSITION __ 
tPLH why a 


OPPOSITE PHASE 
INPUT TRANSITION __ 
nnn nn OV 


WFRO02980 





Propagation Delay 







LOW-HIGH-LOW 
PULSE 





— 
HIGH-LOW-HIGH __ 
PULSE a ee at BY. 


WFRO2790 
Pulse Width 
Disable 
Enable ay 
CONTROL __ 1 
INPUT oy 
Ov 
tz 
OUTPUT 0.5 V 
NORMALLY ~15 V 


LOW 








tHz | 
VoH 
OUTPUT 
NORMALLY 1S Vv ~15 V 
HIGH — $4 OPEN 05 V 
~OV 
WEFRO2663 


Notes: 1. Diagram shown for Input Control Enable-LOW 
and Input Contro! Disable-HIGH. 
2. S1, Se, and S3 of Load Circuit are closed 
except where shown. 


Enable and Disable Times 
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Notes on Test Methods 


The following points give the general philosophy which we 
apply to tests which must be properly engineered if they are to 
be implemented in an automatic environment. The specifics of 
what philosophies applied to which test are shown. 


1. 


Ensure the part is adequately decoupled at the test head. 
Large changes in supply current when the device switches 
may cause function failures due to Vcc changes. 


. Do not leave inputs floating during any tests, as they may 


oscillate at high frequency. 


Do not attempt to perform threshold tests at high speed. 
Following an input transition, ground current may change by 
as much as 400 mA in 5 —- 8 ns. Inductance in the ground 
cable may allow the ground pin at the device to rise by 
hundreds of millivolts momentarily. 


. Use extreme care in defining input levels for AC tests. Many 


inputs may be changed at once, so there will be significant 
noise at the device pins which may not actually reach Vj,_ or 
Vin until the noise has settled. AMD recommends using 
Vit <O V and Vi 23 V for AC tests. 


. To simplify failure analysis, programs should be designed to 


perform DC, Function, and AC tests as three distinct groups 
of tests. 


. Capacitive Loading for AC Testing 


Automatic testers and their associated hardware have stray 
capacitance which varies from one type of tester to 
another, but is generally around 50 pF. This makes it 
impossible to make direct measurements of parameters 
which call for a smaller capacitive load than the associated 
stray capacitance. Typical examples of this are the so- 
called ''float delays'' which measure the propagation 
delays into and out of the high-impedance state, and are 
usually specified at a load capacitance of 5.0 pF. In these 
cases, the test is performed at the higher load capacitance 
(typically 50 pF), and engineering correlations based on 
data taken with a bench setup are used to predict the re- 
sult at the lower capacitance. 


WAVEFORM 


aE SE 





SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


MUST BE 
STEADY 


MAY CHANGE 
FROM H TOL 


MAY CHANGE 
FROML TOH 


DON'T CARE; 
ANY CHANGE 
PERMITTED 


DOES NOT 
APPLY 


Similarly, a product may be specified at more than one 
capacitive load. Since the typical automatic tester is not 
capable of switching loads in mid-test, it is impossible to 
make measurements at both capacitances even though 


- they may both be greater than the stray capacitance. In 


these cases, a measurement is made at one of the two 
capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench setup and the knowledge that certain 
DC measurements (lox, !o_, for example) have already 
been taken and are within specification. In some cases, 
special DC tests are performed in order to facilitate this 
correlation. 


7. Threshold Testing 


The noise associated with automatic testing, the long 
inductive cables, and the high gain of bipolar devices when 
in the vicinity of the actual device threshold frequently give 
rise to oscillations when testing high-speed circuits. These 
oscillations are not indicative of a reject device, but instead, 
of an overtaxed test system. To minimize this problem, 
thresholds are tested at least once for each input pin. 
Thereafter, "'hard'’ high and low levels are used for other 
tests. Generally this means that function and AC testing are 
performed at ''hard" input levels rather than at Vij, max. 
and Vip, min. 


8. AC Testing 


Occasionally parameters are specified which cannot be 
measured directly on automatic testers because of tester 
limitations. Data input hold times often fall into this catego- 
ry. In these cases, the parameter in question is guaranteed 
by correlating these tests with other AC tests which have 
been performed. These correlations are arrived at by the 
cognizant engineer by using data from precise bench 
measurements in conjunction with the knowledge that 
certain DC parameters have already been measured and 
are within specification. 


In some cases, certain AC tests are redundant since they 
can be shown to be predicted by other tests which have 
already been performed. In these cases, the redundant 
tests are not performed. 
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SWITCHING WAVEFORMS (Cont'd.) 


3.0 V 


a VaVaVaV ale ataVatavaraval YN AAAAAAA/ 
pure WY RRR 
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DELAY 
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WFRO02990 


CYCLE 1 CYCLE 2 | 


CLOCK ! 
HOLD \ : | 


~ aa a | 


INTEN 
(Note 1) 


& 


a I 


[oftma 





INT-VECT BUFFER 


ADDRESS REGISTER 
(Note 3) . 


INTERRUPT RETURN © 
ADDRESS REGISTER 
(Note 3) 


WF025100 


Interrupt Timing 





Notes: 1. Interrupt Request comes from an interrupt-controller register. If reflects the CP 1 to INTR time of 
the interrupt controller. . 
2. During Cycle 2, there may be contention on the Y-bus if the Y-bus is turned ON before the INT- 
VECT buffer is turned OFF. . 
3. Refer to Figures 4 and 5 for definition of A and B. 
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SWITCHING WAVEFORMS (Cont'd.) . 
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INPUT/OUTPUT INTERFACE CONDITIONS 
(All Devices) 
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Am29332 


32-Bit Arithmetic Logic Unit 


DISTINCTIVE CHARACTERISTICS 


Single Chip, 32-Bit ALU 

Supports 80-90 ns microcycie time for the 32-bit 
data path. It is a combinatorial ALU with equal cy- 
cle time for all instructions. 

Flow-through Architecture 
A combinatorial ALU with two input data ports and 
one output data port allows implementation of either 
parallel or pipelined architectures. 

64-Bit In, 32-Bit Out Funnel Shifter 

This unique functional block allows n-bit shift-up, 
shift-down, 32-bit barrel shift or 32-bit field extract. 


@ Supports All Data Types 
It supports one-, two-, three- and four-byte data for 
all operations and variable-length fields for logical 
operations. 
Multiply and. Divide Support 
Built-in hardware to support two-bit-at-a-time modi- 
fied Booth's algorithm and one-bit-at-a-time division 
algorithm. 
Extensive Error Checkin 
Parity check and generate provides data transmis- 
sion check and master/slave mode provides com- 
plete function checking. | 3 


GENERAL DESCRIPTION 


The Am29332 is a 32-bit wide non-cascadable Arithmetic 
Logic Unit (ALU) with integration of functions that normally 
don't cascade, such as barrel shifters, priority encoders 


and mask generators. Two input data ports and one output - 


data port provide flow-through architecture and allow the 
designer to implement his/her architecture with any degree 
of pipelining and no built-in penalties for branching. Also, 
the simplicity of a three-bus ALU allows easy implementa- 
tion of parallel or reconfigurable architectures. The register 
file is off-chip to allow unlimited expansion and regular 
addressability. 


The Am29332 supports one-, two-, three- and four-byte 
data for arithmetic and logic operations. It also supports 


multiprecision arithmetic and shift operations. For logical 
operations, it can support variable-length fields up to 32 
bits. When fewer than four bytes are selected, unselected 
bits are passed to the destination without modification. The 
device also supports two-bit-at-a-time modified Booth's 
algorithm for high-speed multiplication and one-bit-at-a- 
time division. Both signed and unsigned integers for all byte 
aligned data types mentioned above are supported. 


The Am29332 is designed to support 80-90 ns microcycle 
time. The device is packaged in a 169-lead pin-grid-array 
package. 
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RELATED AMD PRODUCTS 


Am29001 
Am29G10A 
Am290101 
Ameo 2 
Am2oit4 
Am29C116 CMOS 16-Bit Microcontroller 

Am290325 
Am29625 
Am290325 
fam2aasé | 64x18 Four-Por, Dual-Access Register File __—_—| 


CONNECTION DIAGRAM 
169-Lead PGA 
Bottom View 


DB8 O88 DBI10 DAI OB12  ODAi4 DBI6 PBI 0818 0819 DB20 
PBO DAQ DBI! DA12.——DA13 DAi6 OA17 DA18 DA20 


PAO DA8 ODAi0 GND 0813 DAIS VCO DB17— DAI&8 0821 


L vcc PY1 GNDT GND Y2 


GND PYO YO PERR GND YI Y4 GNOT vcc Y9 


* This pin is not used 


Key: VCCE = VCC, ECL 
VCCT -VCC, TTL 
GNDE = GND, ECL 
GNDT = GND, TTL 
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PIN DESIGNATIONS 


(Sorted by Pin Names) 
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LOGIC SYMBOL — 


DA, DAs, 
SLAVE 
BOROW 
MCin 
MLINK 
wm 
CP 


RS 


Yo7%3 PY, -PY34 MSERR 





Die size: 367 x 387 mils 
Gate Count: 5200 
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ORDERING INFORMATION 


Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid 
Combination) is formed by a combination of: a. Device Number 
b. Speed Option (if applicable) 
c. Package Type 
d. Temperature Range 
e. Optional Processing _ 


B 
a OPTIONAL PROCESSING . 


Blank = Standard processing 
B = Burn-in 








AM29332 G C 


d. TEMPERATURE RANGE 
C = Commercial (0 to + 85°C) 


c. PACKAGE TYPE 
G = 169-Lead Pin Grid Array with Heatsink I. 
(CG 169) 





b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29332/Am29332A 
32-Bit Arithmetic Logic Unit 


Valid Combinations 
, ; Valid Combinations list configurations planned to be 
Val 
S =o nations supported in volume for this device. Consult the local AMD ; 
GC, GCB sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 


to obtain additional data on AMD's standard military grade 
products. 
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PIN DESCRIPTION 


BOROW ___—sCiBborrow (Input) 
When HIGH, the Carry In and Carry Out are borrows for 
subtract operations. 


C, Z, N, V, L Status (Input/Output) 


When the Register Status pin is LOW, these pins give the 


Carry, Zero, Negative, Overflow and Link outputs of the ALU 
where applicable to the instruction being executed. When 
not applicable to the instruction being executed, or when the 
Register Status pin is HIGH, these pins give the outputs of 
the Carry, Zero, Negative, Overflow and Link bits of the 


internal Status Register. In Slave mode, C, Z, N, V and L 


become inputs. 


CP Clock Input (Input) 
Clocks internal registers (status, Q) at the LOW to HIGH 
transition, provided HOLD input is LOW. 


DAg-DA3; Data Input for DA-bus (Input) 
Data input lines for operand A. 


DBo-DB3; Data Input for DB-bus (Input) 
Data input lines for operand B. 


HOLD _ Hold (Input, Active HIGH) 
When HIGH, it inhibits the update of the status and Q 
registers. 


lo-lg Instruction Inputs (Input) 
Used to select the operation to be performed. 


I7-lg Byte Width Inputs (Input) 
Byte width inputs for byte boundary aligned operand 
instructions. Selects the sources for width and position 
inputs for variable field bit operands. If l7 is LOW it selects 
the width input from pins W4- Wp. If l7 is HIGH the width 
input is selected from the internal width register. Similarly if 
lg is LOW it selects the position inputs from pins Ps5 - Po and 
if HIGH it selects input from the internal position register. 


MCin Macro Status Carry (Input) 
External Carry input. 


-MLINK Macro Status Link (Input) 
External link input. 


M/m  Macro/Micro Select (Input) 
When HIGH, selects macro carry and macro link pins as 
input instead of micro carry and micro link from the micro- 
status register. 


MSERR Master-Slave Error (Output) 
When HIGH, this signal indicates that the master's and 
slave's data were not identical. 


OE-Y Output Enable (Input, Active LOW) 
When OE-Y is HIGH the Y-bus is disabled (three-stated). 


-Ps5 Position Inputs (Input) 

Position input to select the position of the least significant bit 
of a field. Also indicates the amount by which data is to be 
shifted up (P5 = LOW) or down (Ps = HIGH) or rotated. 


PAg-PA3 _ Parity Input for DA-bus (input) 
Parity input for operand A on DA-bus (one per byte). 
Even parity is used for the Am29332. 


PBo-PB3 _ (Parity Input for DB-bus (Input) 
Parity input for operand B on DB-bus (one per byte). 


PERR s*~Parity Error (Input/Output) | 
When HIGH, indicates that a parity error was detected on 
the DA or DB inputs. 


PYg-PY3 Parity for Y-bus (Input/Output) 
Parity output for data on Y-bus (one per byte). Even parity is 
used for the Am29332. In slave mode, PYg - PY3 become 
inputs. 


RS Register Status Mode Pin (Input) 
Selects between ALU status (Register Status = LOW) or 
register status (Register Status = HIGH) on the C, Z, N, V 
and L outputs. 


SLAVE = Slave (Input) 
When HIGH, this pin puts the ALU in the slave mode. All 
output pins become input pins and signals on them are 
compared with the ALU's internally generated results. When 
OE-Y is HIGH, the Yo-Y31 and PYp-PY3 inputs are 
ignored. When the SLAVE pin is LOW, the ALU is put in 
master mode where outputs are generated as normal. 


Wo-W4 Width Inputs (Input) 
Width input to select the width of a contiguous bit field. 


Yo-Y31 Data Out/In Lines (Input/Output) 
When OE-Y is LOW and the ALU is in the Master mode, the 
ALU result is enabled on the Y-bus. When OE-Y is HIGH, 
the Y-bus is three-stated. In Slave mode the Y-bus acts as 
external data input. 
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Figure 1. Detailed Block Diagram 
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Figure 2. Am29332 Family High-Performance System Block Diagram 


PRODUCT OVERVIEW 


The Am29332 is a 32-bit wide, high-performance, non-expand- 
able Arithmetic Logic Unit (ALU). It-has two 32-bit wide input 
ports (A and B) and one 32-bit wide output port (Y). These 
three ports provide flexibility and accessibility for high-perfor- 
mance processor designs. Dedicated input and output ports 
provide a flow-through architecture and avoid the penalty 
associated with switching the bus half-way through the cycle 
for input and output of data. The chip is designed for use with 
a dual-access RAM (Am29334) as a register file. In addition, 
the three-bus architecture facilitates the connection of other 
arithmetic units in parallel with the Am29332 for high-perfor- 
mance systems. 


The Am29332 supports one-, two-, three-, and four-byte 
arithmetic operations. It also supports multiprecision arithme- 
tic and multiple bit shifts. For logical operations, it can handle 
variable-length fields of up to 32 bits. The chip incorporates 
dedicated hardware to allow efficient implementation of a two 
bit-at-a-time (modified Booth) multiply algorithm, supporting 
signed and unsigned arithmetic data types. Similarly, hardware 
is provided to support a bit-at-a-time divide algorithm, also 
supporting signed and unsigned arithmetic data types. An 
internal 32-bit register (Q) is used by the multiply and divide 
hardware for double precision operands. For business applica- 
tions, the Am29332 supports variable-length BCD arithmetic. 


Field logical instructions operate on bit-fields taken from the A 
and B data inputs; they may be of variable width and starting 
position. A is normally the source input and B the destination 
input. In general, destination bits not falling within a specified 
field are passed by the ALU unchanged. Field width and 
position are specified either by direct inputs to the chip, or by 
entries in the status register. There are two kinds of field 
logical instructions — aligned and non-aligned. The first type of 
instruction assumes that source and destination fields are 
aligned and the operation is performed only for bits within the 
specified fields. In the second type of instruction, source and 
destination fields are normally non-aligned. However, it is 
always assumed that one field (either source or destination) is 
least-significant-bit (LSB) aligned. 


If the destination field is LSB aligned then the source field is 
downshifted in order to make it LSB aligned as well. Down- 





shifting is accomplished by making the 6-bit position input 
equal to the two's complement of the number of places the 
field is to be downshifted. If the source field is LSB aligned 
then it is upshifted in order to align it with the destination. 
Upshifting is accomplished by making the position inputs equal 
to the number of places the field is to be upshifted. Any other 
type of field operation is not allowed. Whenever the field 
crosses the word boundary, the portion not falling within the 
word boundary is ignored. This effect is useful when perform- 
ing operations on fields that overlap two different words. 
Instructions to perform straightforward multiple-bit shifts (ei- 
ther up or down) are also provided. Additionally, it is possible 
to extract a bit-field from a word in one instruction, even if that 
field overlaps a word boundary. 


The power and the flexibility of the processor comes partly 
from its ability to generate a mask to control the width of an 
operation for each instruction without any overhead. For all 
byte aligned instructions (three quarters of the instruction set), 
the mask is either 1, 2, 3 or 4 bytes wide and is generated from 
the byte width input (lg — 17). For all field instructions the mask 
is of variable width and is generated from the position inputs 
(Po - Ps) and the width inputs (Wo - Wa). Table 1 describes 
the position displacement from the position inputs and Table 2 
the bit field from the width inputs. 


TABLE 1. POSITION INPUTS AND BIT 
DISPLACEMENT 
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TABLE 2. WIDTH INPUTS AND BIT FIELD 


Bit Field 
Wo | w 


32 
1 
2 


31 





Whenever the width of the operand is less than 32-bits, all 
unselected bits from the inputs of the ALU are passed to the 
output without any modification. Depending upon the instruc- 
tion type, unselected bits are taken from different sources. For 
example in all single operand instructions, bits from the source 
operand (from either A or B input) are passed in unselected bit 
positions. For two operand instructions, bits from the B input 
are passed in unselected bit positions. There are some 
exceptions which are explained in the instruction set section. 


The processor has a 32-bit status register to indicate the 
status of different operations performed. The status register is 
loaded at the rising edge of the clock with new status unless 
the HOLD signal is HIGH. The bit position for each status bit is 
given in the functional description. The least significant byte of 
the status register holds the six position bits (PRo - PRs). The 
two most significant bits of this byte may be read or loaded but 
are otherwise unused by the ALU. The second byte (bits 8 to 
15) consists of the five width bits (WRo — WR,z) and three read- 
only bits that are a combinational function of other status bits, 
and which indicate useful branch conditions. The third byte 
consists of ALU status bits plus bits for high-speed multiply 
and divide. The most significant byte holds intermediate nibble 
carries for BCD operations. An extract-status instruction is 
provided which allows a Boolean value to be formed from any 
selected bit. This is particularly useful in machines employing a 
stack architecture. Instructions to save and restore the status 
_ register are provided. As the entire status of each instruction is 
stored in the status register, interrupts at any microinstruction 
boundary are feasible. 


The processor has a 32-bit wide priority encoder to support 
floating-point and graphics operations. The priority encoder 
supports all byte aligned data types - the result is dependent 
upon the byte width specified. The result of a priority encode is 
also loaded into the position bits of the status register. The 
result of the prioritize operation can then be used in the 
following clock cycle, e.g., to normalize a floating-point num- 
ber or to help detect the edge of a polygon in graphics 
applications. 


To support system diagnostics, the Am29332 has a special 
""Master-Slave'’ mode. To use this mode, two chips are 
connected in parallel, and hence receive the same instructions 
and data. The master chip is used for the normal data path. 
However, in the slave chip, all outputs becomes inputs. The 
slave compares the outputs of the master with its own 
internally generated result. If the two do not match, the slave 
will activate an error signal. 


As a further diagnostic aid, byte-wise parity checking is 
performed at both the A and B data inputs. The ''parity’’ signal 
is activated if an error is detected. Parity bits (one per byte) are 
generated for the 32-bit output bus. 


FUNCTIONAL DESCRIPTION 


A detailed description of each functional block is given in the 
following paragraphs. 


64-Bit Funnel Shifter 


The 64-bit funnel shifter is a combinatorial network. The 64-bit 
input is formed from a combination of the A and B inputs. This 
may be left-shifted by up to 31 bits before being used by the 
ALU. The output of the shifter is the most significant 32 bits of 
the result. The 64-bit shifter can be used on either the A or B 
Operands to perform barrel shifts (either up or down) or 
rotates. The operation is controlled by positioning operands 
properly at the input of the 64-bit up-shifter. 


The number ''n" by which the operand is shifted comes from 
two sources: the microprogram memory via the Po — Ps pins or 
the internal register (byte 0 of the status register), PRo — PRs, 
as selected by an instruction bit. 


In general, the 6-bit position input, Po - Ps, takes a 6-bit two's 
complement number representing upshifts from 0 to 31 places 
(positive numbers) or downshifts from 1 to 32 places (negative 
numbers). 


Mask Generator 


The mask generator logic provides the ability to generate the 
appropriate mask for an operand of given width and position. 
The generation of the mask depends upon two types of 
instructions. The first type has byte boundary aligned oper- 
ands (widths of either 1, 2, 3 or 4 bytes) with the least 
significant bit aligned to bit 0. The width of an operand is 
specified by the byte width inputs (lg and 7) as shown in Table 
3. The second type of instruction has operands of variable 
width (1 to 32 bits) and position. The operand is specified by 
the width inputs (Wo — Wa) and the position inputs (Po — Ps) 
indicating the least significant bit position of the operand. 
Thus, in this type of instruction the operand may or may not be 
least significant bit aligned. Depending upon the type of 
instruction, the mask generator first generates a fence of ail 
zeros starting from the least significant bit with the width 
specified either by the byte width or the width input fields. This 
fence can be upshifted by up to 31 bits by the 32-bit mask 
shifter. Whenever the mask is moved up over the 32-bit 
boundary, it does not wrap around. Instead, ONE's are 
inserted from the least significant end. This configuration 
provides the ability to operate on a contiguous field located 
anywhere in a word, or across a word boundary. 


The mask generator can be used as a pattern generator by 
allowing the mask to pass through ALU (by using the PASS- 
MASK instruction). For example, a single-bit wide mask can be 
generated and by shifting it up by different amounts can give 
walking ONE or walking ZERO patterns for memory tests. 


TABLE 3. 





Arithmetic and Logical Unit 


The ALU is a three input unit which uses the mask as a second 
or third operand in every instruction. The mask is used to 
merge two operands. For all selected bits (wherever the mask 
is 0), the desired operation specified by the instruction input is 
performed, and for all unselected bits either corresponding 
destination bits or zeros are passed through. The status of 
each operation (carry, negative, zero, overflow, link) applies to 
the result only over the specified width. For all byte aligned 
arithmetic and logical operations (first three quarters of the 
instruction set), the status is extracted from the appropriate 
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byte boundary. For all field operations (last quarter of the 
instruction set), the operand width is assumed to be 32 bits for 
status generation. The ZERO flag always indicates the status 
of all bits selected by the mask. 


The actual width of the ALU is 34 bits. There are two extra bits 
used for the high speed signed and unsigned multiplication 
instructions. These two bits are automatically concatenated to 
the most-significant end of the ALU depending upon the width 
specified for the operation. Since the modified Booth algorithm 
requires a two-bit down-shift each cycle, these ALU bits 
generate the two most-significant bits of the partial product. 


The ALU is capable of shifting data down by two bits for the 
multiplication algorithm, up by one bit for the divide algorithm 
and _ single-bit-up-shifts. 


The processor is capable of performing BCD arithmetic on 
packed BCD numbers. The ALU has separate carry logic for 
BCD operations. This logic generates nibble carries (BCD digit 
carry) from propagate and generate signals formed from the A 
and B operands. In order to simplify the hardware while 
maintaining throughput, the BCD add and subtract operations 
are performed in two cycles. In the first cycle, ordinary binary 
addition or subtraction is performed and BCD nibble carries 
are generated. These are blocked from affecting the result at 
this stage, but are saved in the status register to be used later 
for BCD correction (NCg — NC7). In the second cycle all BCD 
numbers are adjusted by examining the previously generated 
nibble carries. Since all the necessary information is stored in 
the status register, the processor can be interrupted after the 
first BCD cycle. 


Priority Encoder 


The priority encoder is provided to support floating-point 
arithmetic and some graphics primitives. The priority encoder 
takes up to 32 bits as input and generates a 5-bit wide binary 
code to indicate location of the most significant one in the 
operand. Input to the priority encoder comes from the input 
multiplexer, which masks all bits that the user does not want to 
participate in the prioritization. The priority encoder supports 8, 
16, 24 and 32-bit operations depending upon the byte. width 
specified. For each data type the priority encoder generates 
the appropriate binary weighted code. For example, when a 
byte width of two is specified (Il7 —1g = 10), the output of the 
encoder is zero when bit 15 is HIGH. However, if byte width of 
four is specified (lg-—|l7 = 00), the output of encoder is 16 
(decimal) if bit 15 is HIGH and bits 31-16 are LOW. Table 4 
shows the output for each data type. If none of the inputs are 
HIGH or the most significant bit of the data type specified is 
HIGH, then the output is zero. The difference between these 
two cases is indicated by the Z-flag of the status register which 
is HIGH only if all inputs are zero. 


Q-Register 


The Q-register holds dividend and quotient bits for division, 
and multiplier and product bits for multiplication. During 
division, the contents of the Q-register are shifted left, a bit at 
a time, with quotient bits inserted into bit 0. During multiplica- 
tion, the contents of the Q-register are shifted right, two bits at 


a time, with product bits inserted into the most-significant two 
bits (according to the selected byte width). The Q-register may 
be loaded from the A or B inputs and read onto the Y bus. 


Master-Slave Comparator 


All ALU outputs (except MSERR) employ three-state buffers. 
The master-slave comparator compares the input and output 
of each buffer. Any difference causes the MSERR signal to be 


made true. In Slave mode, all output buffers are disabled. 


Outputs from a second ALU may then be connected to the 
equivalent pins of the first. The comparator in the slave will 
then detect any difference in the results generated by the two. 
When the Y bus is three-stated by making Output-Enable 


false, the Y bus master-slave comparators are disabled. 


Parity Logic 


For each byte of the DA and DB inputs there is an associated 
parity bit (8 in all). If a parity error is detected on any byte, the 
Parity-Error signal is made true. Four parity signals (one per 
byte) are also generated for the Y bus outputs. EVEN parity is 
employed for the Am29332. 


Status Register 


All necessary information about operations performed in the 
ALU is stored in the 32-bit wide status register after every 
microcycle. Since the register can be saved, an interrupt can 
occur after any cycle. The status register can be loaded from 
either the A or B input of the chip and can be read out on the Y 
bus for saving in an external register file. For loading, the byte 
width indicates how many bytes are to be updated. The status 
register is only updated if the HOLD input is inactive. 


Each byte of the status register holds different types of 
information (see Figure 3). The least significant byte (bits 0 to 
7) holds eight position bits (PRg -PR7) for the data shifter. 
The two most significant bits are not used. The next most 
significant byte (bits 8 to 15) holds the 5-bit width field 
(WRo —- WRa) for the mask generator. The three most-signifi- 
cant bits of that byte (bits 13 to 15) are read-only bits that 
represent three different conditions extracted from the other 
bits of the status register. They are C+Z, N ® V, and (N ® 
V) + Z for bits 13, 14 and 15 respectively. These bits can be 
read on the Yo pin by the extract-status instruction. The next 
byte contains all the necessary information generated by an 
ALU operation. The least-significant four bits (bits 16 to 19) 
hold carry, negative, overflow and zero flags. Bit 20 holds link 
information for single bit shifts and bits 21 and 22 are used by 
the multiply and divide instructions. The M flag holds the 
multiplier bit for the modified Booth algorithm or it holds the 
sign comparison result for the divide algorithm. The S flag 
holds the sign of the partial remainder for unsigned division. 
Both the flags (M and S) are provided as a part of the status 
register so that multiply and divide instructions can be inter- 
rupted at microinstruction boundaries. The most significant 
byte of the status register holds nibble carries for BCD 
arithmetic. Since BCD arithmetic is performed in two cycles, 
the nibble carries are saved in the first cycle and used in the 
second cycle. Since all the information is stored, BCD instruc- 
tions are also interruptible at the microinstruction boundary. 
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TABLE 4. Statuso_7: Position Register 


nope rig | ae Ton [oe [ors [ref [eom doe] | 
Active Bit Output I), 
7 6 5 4 3 2 1 0 


l7-lg = 00 (32-bit) 

























None | 
31 Statusg_4o: Width Register i 
30 Status, 3: C+Z 7 
29 Status; 4: N@V Read Only 
28 Status; 5: (N@V)+Z 
SIGNED | SIGNED | UNSIGNED 
15 14 13 12 11 10 9 8 : 
I7—lg = 01 (8-bit) Status4¢: Carry © 
None Status47: Negative 
Status 1g: Overflow 
Status} 9: Zero 
Statusg0: Link 
Statuso4: Multiply (and divide) Bit 
Statusoo: Sign Flag 
Statuso3: 0 





l7-1g = 10 (16-bit) 


None 23 22 21 20 19 18 17 16 
15 

14 Statuso4_31: Nibble Carries 

13 


12 


31 30 29 28 27 26 25 24 


Note: Overflow is defined as follows: 
V = (carry in to MSB) ® (carry out of MSB) 


I7-lg = 11 (24-bit) 
None 
23 
22 
21 
20 


Figure 3. ALU Status Register Bit Assignment 





3-47 





Am29332 INSTRUCTION SET 
Data Types 
_ The Am29332 supports the following data types: 


1. Integer 
2. Binary-coded decimal 
3. Variable-length bit field 


The first two data types fall into the category of byte boundary 
aligned operands (Figure 4). The size of the operand could be 
1 byte, 2 bytes, 3 bytes or 4 bytes. All operands are least 
significant bit (bit 0) aligned. The byte width is determined by 
bits Ig and l7 of the instruction as shown in Table 5. 


TABLE 5. 


Width in 
I7 Bytes 
4 





The third data type has operands of variable width (1 to 32 
bits) as shown in Figure 4. The operand is specified by width 
inputs (Wo ~ Wa) and position inputs (Po —- Ps). The position 
inputs indicate the least significant bit position of the operand. 
Depending on bits Ig and l|7 of the instruction, the width and 
position inputs can be selected from either the Status Register 
or the Width and Position Pins as shown in Table 6. A 
summary of the data types available is illustrated in Table 7. 


7 0 


4 BYTES 


TBO00096 


Byte Boundary Aligned Operands 


3 p + wel Pp p-1 0 


wei 0 


TBO00630 


Variable-Length Bit Field 


p = Bit displacement of the least significant field with re- 
spect to bit 0. 
w = Width of bit field. 


Figure 4. Data Types 


TABLE 6. 


Integer 
1 byte 8 bits 
2 bytes 16 bits 


Unsigned 
-128 to +127 0 to 255 
~2'5 to 
+215_4 
-223 to 223_4 


3 bytes 24 bits 


931 to 231_4 


4 bytes 32 bits 


BCD 1 to 4 bytes 
(8 digits) 


Numeric, 2 digits per byte. 
Most-significant digit may be 
used for sign. 

Dependent on position and 
width inputs. 


Variable 1 to 32 bits 





Instruction Format 
The Am29332 has two types of Instruction Formats: 
1. Byte Boundary Aligned Instructions (FORMAT 1): 


ig ly Ig i) 


TBO00098 


2. Variable-Length Field Bit Instructions (FORMAT 2): 


tg ly lg lo 


TBOOO0S9 


For instructions that allow a field to be shifted up or down, — 
Po-Ps is a two's-complement number in the range -32 to 
+31 representing the direction and magnitude of the shift. For 
instructions that assume a fixed field position, Po - P4 repre- 
sent the position of the least-significant bit of the field and Ps 
is ignored. 


instruction Classification 
ALU instructions can be classified as follows: 
A. Byte Boundary Aligned Operand Instructions: 


1. Arithmetic 
~ Binary, BCD 
~ Multiply steps 
~ Division steps (single and multiple precision) 


Prioritize 


2. 

3. Logical 
4. Single-bit shifts 
5. 


Data movement 
B. Variable-Length Bit Field Operand Instructions: 
1. N-bit shifts and rotates 
2. Bit manipulations 
3. Field logical operations (aligned, non-aligned, extract) 


4. Mask generation 


Three-fourths of the ALU instructions apply to operands that 
are byte boundary aligned. For these instructions, two orthog- 
onal issues are the width of the operand (in bytes) and the 
contents of the high order unselected bytes on the Y bus. As 
mentioned earlier, the width of the operand is specified by |g 
and |7. With the exception of a few instructions, the unselected 
bytes are assigned values as follows: for single operand 
instructions, unselected bytes are passed unchanged from the 
source (A or B). For two operand instructions, unselected 
bytes are passed unchanged from the destination (B input). 


In the last quarter of the instruction set, the width of the 
operand is from 1 to 32 bits (based on the width input) for field 
operations, 32 bits for N-bit shift operations and 1-bit for bit- 
oriented operations. in the case of field-aligned and single-bit 
operands, the position bits (Po-—P,4) determine the least 
significant bit of the operand. In the case of N-bit shifts and 
field non-aligned operands, the position bits Po — Ps is a 6-bit 
signed integer determining the magnitude and direction of the 
shift. 


Flags 
Byte-Aligned Instructions 
The zero flag always looks only at the selected bytes: 


Z <= (Y and bytemask (byte width) = 0) 


Similarly, N < sign bit (Y, byte width), where the function 


"'sign-bit"’ returns bit 7, 15, 23, or 31 of the first argument for 
byte widths 01, 10, 11, or 00 respectively. 


Also, C < carry (byte width) returns the carry from the 
appropriate byte boundary, and: 


V <= overflow (byte width) = (carry into MSB) ® (carry 
out of MSB) 


returns the overflow from the appropriate byte boundary. 


The link (L) flag is generally loaded with the bit moved out of 
the highest selected byte in the case of upshifts, or the bit 
moved out of the least significant byte for downshifts. Figure 5 
shows the shift operation using link bit. Other status flags have 
specialized uses, explained in the following sections. 


Shift Down: 


«—1, 2,3, or 4 bytes—__+ 





DF006190 


Figure 5. Upshift/Downshift Using Link Bit 


Variable-Length Fieid Instruction: 


Generally, only N and Z are affected. N takes the most- 
significant bit of the 32-bit result (i.e, N — Y3 1). Z detects 
zeros in the selected field of the result (i.e., Z -— (Y and 
bitmask (position, width) = 0)). 


Output Select 


The Register Status pin, RS, may be used to switch the C, Z, 
N, V, and L output pins between the direct output of the ALU 
and the outputs of the corresponding bits in the status register. 
If the direct status output is selected, then for instructions that 
do not affect a particular flag (e.g., carry for logical arithmetic) 
that output will reflect the state of its corresponding bit in the 
status register. Similarly, when the HOLD signal is made 
HIGH, the C, Z, N, V and L pins will be made equal to the 
contents of the status register, regardless of the RS input. 
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INSTRUCTION SET SUMMARY 





“Operand Size: Variable Byte Width: 1, 2, 3, 4 Bytes 


Data Type 















e Increment by one, two, four 
e Decrement by one, two, four 

e Add, addc (carry = macro/micro) 

e@ Sub, subr 

e Subc, subre (carry/borrow) 

e BCD sum and difference correct steps 










Binary Integer 
and BCD 
Arithmetic 











e Negate (two's complement) 
e Multiply steps (modified Booth) (Signed and unsigned) 
e Divide oe (non-restoring) 


Single-Bit e@ Upshift with 0, 1, link fill y ; me 
Shifts  Downshift with 0, 1; link, sign fill oiigie ANG coUne pieeston) 
e Zero extend 
Data e Sign extend , 
e Pass-status, Q-Reg Binary 
movement e Load-status, Q-Reg 
. e Merge ‘ 


Operand Size: 32 Bits 


Data Type 
ae e Upshift by 0 to 31 bits with 0 fill 
a. sale e Downshift by 1 to 32 bits with 0, sign fill Binary 
e Rotate by 0 to 31 bits 


Operand Size: Single Bit 


Data Type 
Bit | : ae Binary 
Manipulation e (Aacht 


Operand Size: Variable Length Bitfield: 1 to 32 Bits 


__Data Type 


Field Logical 
(aligned and 
non-aligned) 


_ Binary Integer 





























e Not, OR, XOR, AND, extract, insert Binary 


ZERO-EXTA 
ZERO-EXTB 
SIGN-EXTA 
SIGN-EXTB 
PASS-STAT 
PASS-Q 
LOADQ-A 
LOADQ-B 
NOT-A 
NOT-B 
NEG-A 
NEG-B 
PRIOR-A 
PRIOR-B 
MERGEA-B 
MERGEB-A 


DECR-A 
DECR-B 
INCR-A 
INCR-B 
DECR2-A 
DECR2-B 
INCR2-A 
INCR2-B 
DECR4-A 
DECR4-B 
INCR4-A 
INCR4-B 
LDSTAT-A 


INSTRUCTION SET GLOSSARY 


(Sorted by Opcode in Hex Notation) 


DN1-OF-A 
DN1-0F-B 
DN1-0F-AQ 
DN1-OF-BQ 
DN1-1F-A 
DN1-1F-B 
DN1-1F-AQ 
DN1-1F-BQ 
DN1-LF-A 
DN1-LF-B 
DN1-LF-AQ 
DN1-LF-BQ 
DN1-AR-A 
DN1-AR-B 
DN1-AR-AQ 
DN1-AR-BQ 


UP1-0F-A 
UP1-0F-B 
UP1-0F-AQ 
UP1-0F-BQ 
UP1-1F-A 
UP1-1F-B 
UP1-1F-AQ 
UP1-1F-BQ 
UP1-LF-A 
UP1-LF-B 
UP1-LF-AQ 
UP1-LF-BQ 
ZERO 
SIGN 

OR 

XOR 
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SUM-CORR-A 
SUM-CORR-B 
DIFF-CORR-A 
DIFF-CORR-B 


SDIVFIRST 
UDIVFIRST 


SDIVSTEP 
SDIVLAST1 
MPDIVSTEP1 
MPSDIVSTEP3 
UDIVSTEP 
UDIVLAST 
MPDIVSTEP2 
MPUDIVSTP3 
REMCORR 
QUOCORR 
SDIVLAST2 
UMULFIRST 
UMULSTEP 
UMULLAST 
SMULSTEP 
SMULFIRST 





[eoende [wane Toncose [vane Trovende [Wane [opende [ame 


NB-SN-SHA 
NB-SN-SHB 
NB-OF-SHA 
NB-OF-SHB 
NBROT-A 
NBROT-B 
EXTBIT-A 
EXTBIT-B 
SETBIT-A 
SETBIT-B 
RSTBIT-A 
RSTBIT-B 
SETBIT-STAT 
RSTBIT-STAT 
NOTF-AL-B 
PASSF-AL-B 


NOTF-A 
NOTF-AL-A 
PASSF-A 
PASSF-AL-A 
ORF-A 
ORF-AL-A 
XORF-A 
XORF-AL-A 
ANDF-A 
ANDF-AL-A 
EXTF-A 
EXTF-B 
EXTF-AB 
EXTF-BA 
EXTBIT-STAT 
PASS-MASK 





fe Sees 2. er Pts 











TABLE 6-1. DATA MOVEMENT INSTRUCTIONS | 


ee Se 


Pzenoexta [00 | Zero éwend fo | aA | 

Fs Se SAT ONG TRO! DS Sc 
Psien-exta [02 | Sign Ewen | Sen [a | || [=| |=] 
rsienexts [os [| _———~«dt sn | | | CP TP 
Pwenceas | 0c | Mee Awih® | 8 | AWewee | | | [=| [=| 
Pwencee.a [or | Merge awit | a | Bwewea [| ~ ] =| |+ | — 


TABLE 6-2. DATA MOVEMENT INSTRUCTIONS 


see (a se sae oma sae [aTalate vale 


aa a eet te 
a 
posars op Pe 


TABLE 6-3. DATA MOVEMENT INSTRUCTIONS 


er 
Pek Pe Ee Seon 
PS} sete 


Teass.a | 05 | Pass @ Register ede ed 
1 See ee A SS OS 
I NA A DO HB 


Note: 1. These instructions use the byte aligned instruction format (FORMAT 1). 


























Legend: Unsei = Unselected Byte(s) 
Sel = Selected Byte(s) 

A=A Input 

B =B Input 

Q=Q Register 

as Updated only if byte width is 3 or 4 
= Updated 
Examples: 
2, ZERO EXTB Pass lower two bytes of B to Y with zero fill on upper two bytes 


0, LOADQ-A Load all four bytes of A into Q Register pass updated Q Resistor to Y 
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TABLE 7. LOGICAL INSTRUCTIONS 


vous ae 
eee Description mess Wb afyu es 
Twora | 08 | One's Comprement | A [A 

Cs ne 5 SS A RI SO li 
[zero [ac [Paste [|e | 0 ||] [fol 
sin [ad [ Pass Sn i 8 powem-wey | | Pin] | 
fon [efor ip @ | oe TT Tl 
[avons | | PP 
Pano [40 

a 















Damon PPP Pr 


Note: 1. These instructions use the byte aligned instruction format (FORMAT 1). 





Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 


A=A Input 
B=B Input 
Q=Q Register 
* = Updated 
Examples: 
2, NOT-A Complement low order two bytes of A and output to Y with 
high order two bytes of A uncomplemented. 
1, AND AND first byte of A and B. Output to Y with high three 
bytes of B. 


TABLE 8-1. SINGLE-BIT SHIFT INSTRUCTIONS (SINGLE PRECISION) 


| YOutput | Status 
=e Description | Unsel|_ ss Sel_—S | S| ML Z| VIN | C 
Tonr-or-a | 20_| Bowsnin, 200 FH [A | Vi=AsnYem=o] | [-[] 1 

ae) ee eee 
Dowrahit, One FW [A | Vi=Aet Ynb=? || 1-1 1-1 
onris [25 ps [vest Ymo=t | | ttt fl 
Ponitr-a | 28_| Bowen ac [ A | Vi=Awes Ymep=_ | | 1] (1 
Poniurs | 28 [3 [vein vmo=e | | ttt el 
Dowrahit, Sen Ft [A | Wien. Ymb=" |_| [>] 1-1 
onan | 20 8 [viet m= NP | et 
uriora [90 | Upanit. Zoo rN [A | WeA-nvoro | | ttt) 
Te [va eo Pte 
Upant, Ove FW [AP wav vo=t | 1 tt td 
oe eee ety 
Puriara | 98 | Upanit Uw rw [AP wean vome | | ttt] 
Purr [9 Pe [weaver PP 


Note: 1. These instructions use the byte aligned instruction format (FORMAT 1). 





















Example: 
2, UP1-1F-A Shift lower two bytes of A up one bit. Set LSB to 1. Fill 
unselected bytes to upper two bytes of A. 
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Ei eae ee ee a ee 











TABLE 8-2. SINGLE-BIT SHIFT INSTRUCTIONS (DOUBLE PRECISION) 


[¥ Output &@ Register| Status 
Description Selected Bytes Ps{m{[ue}z{ivin|c. 
Downshit, Zero Fir [| O—>A>a a | | ||] [-| 


[ooeo>e a | | Pt Tl 
J 


~DN1-LF-AQ Downshift, Link Fil 
Downshift, Sign Fill 
en eee ee 


2 
2 
DN1-AR-AQ 
DN1-AR-BQ 


UP1-1F-AQ 
UP1-1F-BQ 


Upshift, One Fill ASQ 
esane s c oe ee om esd os 
UP1-LF-AQ Upshift, Link Fil 


Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 
2. Y Unselected byte from A, Q Unselected byte unchanged. 
3. Y Unselected byte from B, Q Unseiected byte unchanged. 


A ) 
B 3) 
OF | 2) 
OF ) 
UP1-0F-AQ 32 Upshift, Zero Fill 
36 2) 
37 3) 
3A 2) 
3B 





Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A=A Input 
B =B Input 
Q=Q Register 
* = Updated 


Example: 
0, DN1-AR-BQ Shift 64 bits (all 32 bits of both B and Q) 
down by one bit. LSB of B fills MSB of Q. 
MSB of B set to sign bit (bit N of status register). 


Teepe) @ (2 ig 






sign bit 


link status bit 


3, UP1-LF-AQ Shift 48 bits (24-bits of A and 24-bits of Q) 
. up by one bit. MSB of 24-bit Q fills LSB of A. 
MSB of 24-bit A sets link status bit. LSB of 
Q is filled with original link value. 






DFO006200 
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TABLE 9. PRIORITIZE INSTRUCTIONS 


Status 
A ee ee 


| PRIOR-A =| OC | Prioritization Location of Highest 1 Bit fai a a as 
| PRIOR-B | 0D oe eae ec ee ee 


Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 
. Priority also loaded into STATUS <7:0> 
. Refer to Table 4. 












G MN 


Legend: A=A Input 
B=B Input 
Q=Q Register 
* = Updated 
Example: 
3, PRIOR-A Value placed on Y is 2 


| 


Assume A is 01001011 00100010 00000000 00000000 


TABLE 10-1. ARITHMETIC INSTRUCTIONS 


| YoOutput | Status 
Code| __ Description feel SMe 


REA pk J Tavs Component | A 
ness | 06 See 
SE ee 
i a eee 
INCR2-A Increment by Two | A {| At2 | | | iti} 
ee eet eee 
Increment by Four | A | A+4 | | | [iti] 
i ea a 
Decrement by One {A |  A-1_ | | [ [titi | 
aa a eee 
Decrement by Two | A | A-2 | | | "i" || 
Bee Eh es] 
Decrement by Four ee 
ae 


Notes: 1. These instructions use the byte aligned instruction je | (FORMAT 1). 
2. Borrow, rather than carry, is generated if BOROW is HIGH (borrow = carry). 
3. Nibble bits are set by these instructions. NEG-A (or NEG-B) and DIFF-CORR may be used to 
form 10's complement of a BCD number. Use SUM-CORR (for increment) or DIFF-CORR (for 
decrement) to increment or decrement a BCD number. 
















at ce eal oa 
ou rc eee fase 





Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A=A Input 
B=B Input 
Q=Q Register 
* = Updated 


Example: 
2, DECR4-A Decrement lower two bytes of A by 4 
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TABLE 10-2. ARITHMETIC INSTRUCTIONS 


| Youtput | Status 
Oe ear emo 
42 


Subtract with Carry A+B+1+C 2) 6) 
B+A+1+C 2) 6) 


8 | Correct BCD Nibbles A 
SUM-CORR-B for Addition 
DIFF-CORR-A 44 | Correct BCD Nibbles A |Corrected A 3) ] | | [r{r{r{r. 
SIFF-CORRS for Subtraction Poorected 8) PP PPP 


Notes: 1. These instructions use the byte aligned instruction format (FORMAT 1). 

2. BOROW is LOW. For subtract operations, a borrow rather than a carry is stored in STATUS if BOROW is HIGH. 
Carry is always generated for ADD regardless of BOROW. 

3. First, the nibble carries NCo-NC7 are tested. Any nibble carry/borrow that is set to 1 generates ''6" internally as 
a correction word and then the correction word is added (SUM-CORR- ) or subtracted (DIFF-CORR- ) from the 
operand. NCg-NC7 are not affected by this operation. 

4. Use SUM-CORR or DIFF-CORR to add or subtract a BCD number. 

5. Use ADDC, SUBC, or SUBRC to perform operations on integers longer than 32 bits. 

6. Carry bit is obtained from MCin if M/m is HIGH. Otherwise, carry is obtained from the C status bit. 





ADD 
ADDC 
U 
SUBR 
SUBC 
SUBRC 
SUM-CORR-A 





> 
| 

+ 

w 
Ba 
a 
= 
| 
ee 


A+B+t+C 6) 






4 
4 


3 
4 
6 
5 


4 


” 
wo 
w co N 





Legend: Unsel = Unselected Byte(s) 
Sel = Selected Byte(s) 
A=A Input 
B=B Input 
Q=Q Register 


* = Updated only if byte width is 3 or 4 


Example: 
0, ADD Add two 32-bit two's-complement integers 
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TABLE 11-1. DIVIDE INSTRUCTIONS (Aligned Format) 


Code Description eee 5 is{miu|z{vinie| 
| Signed Divide Steps 


Signed Divide Steps 


Tsoweinst [4 €  | Fl isiucton for Signed wide SST dT vO TY 
Tsoverer [50 | Nerate Step (#bis- time SS] ST vO PP 
soivcasti_| 51 | Last Ovide Instruction Uniess SSS vO | 
SDVLASTE eee 


Unsigned Divide Steps 


Multiprecision Divide Steps 
Used for Unsigned Divide 

Correction Steps 
PRewconR [58 | Gorect Remainder After Owae ——=SsSC«dT=SiSdT SY Cd TCT | TS 
F-auocorr | 58 | Corect Quotient After Ovide ————SS<dT SB CdT YT ULL LLL 


TABLE 11-2. EXAMPLE CODING FORM (Signed Division) 





<|<|< 
O}O|0 





< 
© 


< 

O 
ae 
i ee 
es ee 


Y,Q 


2 
ef 
ze 
ie 
ae 
a 
x 
ma 
cea 








Am29332 Y-Out 


OADQ-A 
IGN 
DIVFIRST 
DIVSTEP 
DIVLAST1 


ine) 


2) 
S 


ine) 
O|7D ” ole 
rs 
O 
OQ 
O 
D 
D 


ES 


e) 


ae 
SDIVLAST2A 
ASS-Q 


| 2 | REMCORR 


D 


oN B 

a a 
a 

ray 


Note: Divisor in A, Dividend in A 
Quotient in Q, Remainder in B 


Legend: A =A Input 
B =B Input ; 
S = Status Register 


Q=Q Register 
R1 = Quotient 
R2 = Dividend 
R3 = Remainder 
R4 = Divisor 
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i 
"i 


TABLE 12-1. MULTIPLY INSTRUCTIONS (Aligned Format) | 


7 gece sae 
Code a a s[m[t]z|v|nlo 
ante UO a 


Signed Multiply. Steps 


Pswuiriest [5 | Fist mulipy metuion ——SCS—C~ir SS dT 
Pawuuster [8 | tote stp (woise-teea 8 TT 
[owuriest [58 | Frat milipy meivion ——SCSC~i SS dT 
Pomuister | 5C | terate step (#ois/2-ise i eT 
Comuucast [80 | test mutipy instucton 


TABLE 12-2. EXAMPLE CODING FORM (Unsigned Multiply) 


Lo Am29332 tt 


Cond 
Branch Select B/W a ee 


cca eae eae ee 2 2 ee 

Ta Ome Nees GR 35 LO ec AE ls i 
FORD | tt | | 8 eMurinst | ||| | 
Ps} | 8 Muster | | 
Poort | mast Tes Ts 
CON eS a an Re ae eee 


Note: 1. Put ALU output in B. 
2. Multiplicand in A, Multiplier in Q 
~ Product (HIGH) in B, Product (LOW) in Q 





Am29332 Y-Out 


Legend: A=A Input 

B=B Input 
S = Status Register 
Q=Q Register 

R1 = Multiplier 

R2 = Multiplicand 

R3 = Product (HIGH) 

R4 = Product (LOW) 
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TABLE 13. SHIFT/ROTATE INSTRUCTIONS 


| | Status 
[mentee] omen | vomme TTS 


enn eee 
Vie p= 8) 0 oD ee eee 
ee ne eee 
Yi+p=8i N Mee delnetele 
Field Rotate Yj = A(i - pymod32 3) fe eee Cesta ne 
T¥i=8¢-pmosse | | | FLT | 


Notes: 1. These instructions use the field instruction format (FORMAT 2). 
2. ''p'' stands for bit displacement from Po-Ps or from PRo-PRs5 (-32 <p <31). 
If p is positive, Yp_4 to Yo are equal to the fill bit. 
If p is negative, Y31 to Y31+ +4 are equal to the fill bit. 
3. The sign of the position input is ignored for this instruction and Po-P, are treated as a positive magnitude for a 
circular upshift. 















Legend: A=A Input 


B =B Input 
Q=Q Register 
* = Updated 
Examples: * 
NB-OF-SHA,,4 Shift A up 4 bits and zero fill 


NB-OF-SHB,,-17 Shift B down 17 bits and sign fill 


*Width field not used 


TABLE 14-1. BIT-MANIPULATION INSTRUCTIONS 


¥ Output a 
Description 


SETBIT-A | 68 | Bit Set =A, Yo= 1 
SETBIT-B | 69 | Y; = Bi, Yp = 1 
RSTBIT-A = =A, Yp=0 


| RSTBIT-B B = Bi, Yp = 0 


Sere A Bit Extract ifp>0, Yo=Ap 2) 
if p<0, Yo=Ap 

EXTBIT-B 67 ifp>0, Yo=Bp 2) 
if p<0, Yo= Bp 

EXTBIT-STAT 7E ifp>0, Yo=Sp 2) 
if p< 0, Yo=Sp 


Notes: 1. These instructions use the field instruction format (FORMAT 2). 
2. Y31 to Y; are set to zero. ''p'' stands for the bit displacement from Po-P,4 or from PRo-PRs5. The sign of the position input is 
ignored. 





TABLE 14-2. BIT-MANIPULATION INSTRUCTIONS 















| Status 
Description Status Register Y Output 's|m/L{z/v|nic | 


Pserarstat | ec [Sauserse | s=t fs PP Pry 
[asrar-star | 60 D0 8 


Notes: 1. These instructions use the Field instruction format (FORMAT 2). 
2. ''p’’ stands for the bit displacement from Po-Ps5 or from PRo - PRs. 





Legend: Unsel = Unselected field 
Sel = Selected field 
A=A Input 
B =B Input 
Q=Q Register 
* = Updated 


Examples: 
RSTBIT-B,,3 3rd bit is set to 0 in B 
EXTBIT-STAT,,—4 4th bit in status register is extracted and 
inverted. 
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Aligned Fields 





LD000140 


le— W —r\« ita 





If position (P9-Ps) 2 0, Ais LSB aligned 
Width (Wp- 4) = 1 to 32 


LD000151 


Non-Aligned Fields Case 2: 
k~+— W —++ P -»| 






Y: 





NUANECU 


If position (Py-Ps) < 0, B is LSB aligned 
Width (Wg-Ws) = 1 to 32 
LD000161 


Figure 6. Field Logical Operations 
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TABLE 15. FIELD LOGICAL INSTRUCTIONS 


[veut oe 
peeerton _[unn| et —*s (wee vue 
PPASSF-AL-A | 73 | Field Pass Ei ae Pees 
PPassraLe | eF_ ea [viee COPE 


PASSE-A 


NOTF-AL-A Field Complement 
NOTF-AL-B ie 


NOTF-A if p>0, Yi= Rip 


if p <0, Yj- 
ORF-AL-A Field OR 3) 


77 


XORF-AL-A Field XOR 3) i= Aj 
4) if p>0, Yi= 
if p <0, Yj- 


XORF-A 

ANDE-AL-A Field AND 3) 

ANDE-A 78 4) if p20, Yj=Ai-p AND a 
if p< 2 Yj- 

EXTF-A Field Extract 4) 5) | 

EXTF-B ee 


EXTF-AB 
EXTF-BA 


Notes: 1. These instructions use the field instruction format (FORMAT 2). 

2. p<i<p+w-1. "p" stands for position displacement from Po-Ps or from PRo-PRs and ''w" for the width of the bit field 
from Wo -W4 or WRo-WRy4. Whenever p + w > 32, operation takes place only over the anion of the field up to the end of 
the word. No wraparound occurs, 

3. This instruction uses the aligned format (see Figure 6). 

. 4, This instruction uses the unaligned field format (see Figure 6). 
p20: Case 1 
p <0: Case 2 

5. If p is positive, the input is LSB aligned and Y output aligned at position. 
If p is negative, the input is aligned at |p| and Y output at LSB. 

6. Firstly, the concatenation of A(High Word) and B(Low Word) is rotated by the amount specified by the position (p). If p is 
positive, left-rotate is performed. If p is negative, right-rotate is performed. Secondly, the least significant bits on the Y output 
specified by the width (w) are extracted. 

7. Same as 6) except that B input is taken as a high word and A input as a low word. 


4) 5) 


Legend: Unsel = Unselected Field 
Sel = Selected Field 
A=A Input 
B =B Input 
Q=Q Register 
* = Updated 


For all examples, assume STATUS (7:0) is ~7 and STATUS (12:8) is 3. 


1. 0,PASSF-AL-B,11,20 Pass B to Y and test if Bog to B3o 
are all zero. Set Z status if so. 


B: 100000000000)00000101011100110100 
Z set to 1 in this case 


2. 3,XORF-A,, Exclusive-OR bits A7-Ag with bits 
Bo - Ba and output to Yo- Yo. Pass 
B3 - B31 to Y3- Y31. Width and po- 
sition values are obtained from STA- 
TUS(12: 0). 
A: 011011100010010000101 ifrooh 101011 


B: 00011100001010001100101001001/001 


Ag-7 @ Bg-0=Y: 00011100001010001100101001007[107] 
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TABLE 16. MASK INSTRUCTION 


[Vout [ae 
scissile} pina’. Far ae TS eT Te 
Teass-wask | 7 | Generate Mask [| os | vi-e 1 | | 1 |. 1 1 


‘Notes: 1. This instruction uses the field instruction format (FORMAT 2). 
2. p<i<p+w-1. "'p" stands for the position displacement and ''w"' for the width of bit field. 





Legend: Unsel = Unselected Field 
Sel = Selected Field 
A=A Input 
B =B Input 
Q=Q Register 
_* = Updated 


Example: Generates an 8-bit field mask pattern starting from bit position 10. 


31 18 10 9 0 


3-62 


APPLICATIONS 


Suggestions for Power and Ground Pin 
Connections 


The Am29332 operates in an environment of fast signal rise 
times and substantial switching currents. Therefore, care must 
be exercised during circuit board design and layout, as with 
any high-performance component. The following is a sug- 
gested layout, but since systems vary widely in electrical 
configuration, an empirical evaluation of the intended layout is 
recommended. 


The Vcoct and GNDT pins, which carry output driver switching 
currents, tend to be electrically noisy. The VcceE and GNDE 
pins, which supply the ECL core of the device, tend to produce 
less noise, and the circuits they supply may be adversely 
affected by noise spikes on the Voce plane. For this reason, it 
is best to provide isolation between the Voce and Vccr pins, 
as well as independent decoupling for each. Isolating the 
GNDE and GNDT pins is not required. 


A BCDEF 


ecocoeooooomes oo 0 OC OF 


. 


@ == Through Hole 

4% = Vcc Plane Connection 

Cy = C3 =Cs5=10uF or greater (electrolytic or tan- 
talum capacitor) 

Co = C4=Cg=0.1uF or greater (ceramic or 


monolithic capacitor) 





Printed Circuit-Board Layout Suggestions 


1. Use of a multi-layer PC board with separate power, ground, 
and signal planes is highly recommended. 


2. All Voce and Vcct pins should be connected to the Voc 
plane. Vcct pins should be isolated from Voce pins by means 
of a slot cut in the Voce plane; see Figure 7. By physically 
separating the Voce and VcctT pins, coupled noise will be 
reduced. 


3. All GNDE and GNDT pins should be connected directly to 
the ground plane. 


4. The VccT pins should be decoupled to ground with a 0.1-uF 
ceramic capacitor and a 10-yF electrolytic capacitor, placed 
as closely to the Am29332 as is practical. Voce pins should 
be decoupled to ground in a similar manner. 


A suggested layout is shown in Figure 7. 


HJKLMUMNPRTU 


em 1 
eoeooc°o 2 
ecco ° 3 
+) e : : 4 
© oF me 5 
oer ~~ C3 6 
° Oo: 7 
© OF dy 8 
eo0°0 rT 10 
eo 0 11 
200 .| -_ 12 
C00 7% 13 
e:2e 14 
©eocoo 8 6 15 
ecoooo 16 
2e0e000 17 


<— |solation Cut 


CD010471 


Figure 7. Suggested Printed Circuit-Board Layout 
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THERMAL RESISTANCE 14 a(°CM) 









200 400 600 
AIR VELOCITY (LINEAR FEET PER MINUTE) 


Figure 8. Am29332 Thermal Characteristics (Typical) 


SA Still Air 

OJA 200 LFM 
QA 600 LFM 
Heat Sink 
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ABSOLUTE MAXIMUM RATINGS 


Storage Temperature 

Temperature Under Bias - Tc 

Supply Voltage to Ground Potential 
Continuous 

DC Voltage Applied to Outputs 


-65 to + 150°C 
-55 to +125°C 


-0.5 to +7.0 V 


OPERATING RANGES 


Commercial (C) Case Devices 
Temperature (Tc) 0 to +85°C 
Supply Voltage Vcc +4.75 V to +5.25 V 


Operating ranges define those limits between which the 


for HIGH State -0.5 V to +Vcc Max. 
DC Input Voltage -0.5 to +5.5 V 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


DC CHARACTERISTICS over operating range 
Parameter Parameter Test Conditions 
Symbol Description (Note 1) 
Voc = 4.75 V, 
Output HIGH Voltage Vin = Vint or Vit, All Outputs 
lIoH =-1.2 mA 


Voc = 4.75 V, 
Output LOW Voltage Vin = Vint or Vit, All Outputs 
lol =8 mA 
Input HIGH Level (Guaranteed Logic HIGH 
Input LOW Level (Guaranteed Logic LOW 
Voltage) 
Voc = 4.75 V, 
Input Clamp Voltage All Inputs 


PYo- 3, 
Yo-31 


functionality of the device is guaranteed. 





Volts 


FEL cen eC OOO 


= 


All Inputs Volts 


Volts 


[2.00] 
= 2.50 | 
2.00 | 


Voc = 5.25 V, SEAVE 


Ne Input LOW Current Vin = 0.5 V 


C, Z, V, N, b; 
PERR 


Other 


PYo- 3, 
Yo-31 


SLAVE 
OE-Y 
LK 


C, Z, V, N, b; 
PERR 


Other 


Voc = 5.25 V, All 
Input HIGH Current Vin 55 V Inputs 
Voc = 5.25 V, All 
Vo = 2.4 V Outputs 
Voc = 5.25 V, Except 
vou 05 V_ MSERR 
Output Short-Circuit Current Voc = 5.75 V, 
(Note 2) Vo = 0.5 V 
Power Supply Current Ver = 5.25 V Tc =0 to 85°C 1800 
(Note 3) inci To = 85°C 1690 


Notes: 1. For conditions shown as Min. or Max., use the appropriate value specified under Operating Ranges for the applicable device type. 
2. Not more than one output should be shorted at a time. Duration of the short circuit test should not exceed one second. 
3. Measured with all inputs HIGH and outputs disabled. 


Voc = 5.25 V, 


HH Input HIGH Current Vin =2.4 V 





O oe O a TS 
cim todd 
© ~*~ " wo |o 


lOZH 
Off State Output Current 


lozL 
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ng = yg Seed = 3s oe, Se SG SSS Ho eee ee af 


oe 


ee ee ee 
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SWITCHING CHARACTERISTICS over operating range 


A. COMBINATIONAL PROPAGATION aaa 


From To Max. Delay Max. Delay 


a 
[7 To-e ——SS—*dt o-Ps 
= A 























[3 Wore Sid 
= 
[45 Po-Ps S—~*d CP 
[16 [Pores Sid Yn 
[a7 ro-PsSC«dC NL 
2 
[9 or SSS~d P= PV 
es 
re ce ce 
ea 
ee CeCe 
a 


is 

Teonow——SSS*d SERA 
CHO SSCS~*~rSCSE | 
oe 
[se fron ves ws Pe 
[#0 [eZvnt | weere 
SC = 





SWITCHING CHARACTERISTICS (Cont'd.) 


B. SETUP AND HOLD TIMES 





Max. Value 


1 





Parameter (Note 2) 


Input Data Setup DAg - DA31, DBo ~ DB31 
Input Data Hold DAg - DA31, DBo - DB31 





With Respect To 


QO 
“5 


Byte Width Setup 


Uv 
wo 
Ni 


0 
Nh 
fo) 


| 
| 
Ww 
W 


NUIN 
Oo }|]O 


Borrow Setup BOROW 
mCi 
Mii 
Min 


Hold Mode Setup HOLD 
Hold Mode Hold HOLD 


Uy; UT U Uy) Ul; Ul] Ul] Ul] Uj Vi VDI V Ul! U UiU0 
nN 
Lye) 


i?) 


N 


iw 
NO 


QO 
5 


N 
Le) 


_ io) 
~~ fafa oni —_—] — nN ~~ vy 


Ig 

Ig 

-W4 

-W4 
m 


OQ 


42 
43 
Ad 
45 
46 
47 
48 
51 
54 
56 
57 
61 





C. MINIMUM CLOCK REQUIREMENTS 


re ea 


Minimum Glock HIGH Time 


D. ENABLE AND DISABLE TIMES 


Ne toe) ee ) coe | Max. Delay | Max. Delay 
a oe ae 




















Yo- Y31, PYo-PY3 Output Enable Time 
| OE-Y | Yo-Ya1, PYo-PY3 Output Disable Time 


SLAVE Slave Mode 25 
°C, ZV, N, Lb Enable Time | 
PERR 
67 SLAVE Yo-Y31, PYo-PY3 Slave Mode 25 25 
C,Z,V,N, Lb Disable Time 
PERR 


Notes: 1. It is the responsibility of the user to maintain a case temperature of 85°C or less. AMD recommends an air velocity of at 
least 200 linear feet per minute over the heatsink. 
2. See timing diagram for desired mode of operation to determine clock edge to which these setup and hold times apply. 
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SWITCHING TEST CIRCUITS 


Vec 


¢ S3 TC001083 


TC001102 


2.4V 
Ro =—— 
OH 


z 5.0 -VpeeE- VoL Beas 5.0 - Vee - VoL 


1 
lot + VoL lot + VoL 
1K Ro 


A. Three-State Outputs . B. Normal Outputs 


Notes: 1. C, = 50 pF includes scope probe, wiring and stray capacitances without device in test fixture. 
. S14, Se, Sg are closed during function tests and all AC tests except output enable tests. 
. Sy and Sg are closed while So is open for tpz}4 test. 

S; and Sg are closed while Sg is open for tpz, test. 

. Ci =5.0 pF for output disable tests. 





SWITCHING TEST WAVEFORMS 
















3V 
DATA LOW-HIGH-LOW 
INPUT 15 V PULSE 1.5 V 
ti ov 
$s t 
3V 
TIMING HIGH-LOW-HIGH 
INPUT 15 V PULSE 15 Vv 
ov 
WFR02970 WFR02790 
Setup, Hold, and Release Times Pulse Width 


Notes: 1. Diagram shown for HIGH data only. Output transition . 
may be opposite sense. 
2. Cross hatched area is don't care condition. 
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SAME PHASE ___ 
INPUT TRANSITION 


OUTPUT 


OPPOSITE PHASE ___ 
INPUT TRANSITION 


SWITCHING TEST WAVEFORMS (Cont'd.) 


Enable 


CONTROL __ 
INPUT 


OUTPUT 
NORMALLY 
LOW 


"2H 


yz 


Disable 


OUTPUT 
NORMALLY 15 V 
HIGH 
So OPEN 0.5 V 
~O0 Vv 


WFRO2980 


Propagation Delay 


Test Philosophy and Methods 


The following points give the general philosophy that we apply 
to tests that must be properly engineered if they are to be 
implemented in an automatic environment. The specifics of 
what philosophies applied to which test are shown. | 


1. 


Ensure the part is adequately decoupled at the test head. 
Large changes in supply current when the device switches 
may cause function failures due to Vcc changes. 


. Do not leave inputs floating during any tests, as they may 


oscillate at high frequency. 


.Do not attempt to perform threshold tests at high speed. 


Following an input transition, ground current may change by 
as much as 400 mA in 5 - 8 ns. Inductance in the ground 
cable may allow the ground pin at the device to rise by 
hundreds of millivolts momentarily. 


. Use extreme care in defining input levels for AC tests. Many 


inputs may be changed at once, so there will be significant 
noise at the device pins that may not actually reach Vj, or 
Vi until the noise has settled. AMD recommends using 
Vii SO V and Viy 23 V for AC tests. 


. To simplify failure analysis, programs should be designed to 


perform DC, Function, and AC tests as three distinct groups 
of tests. 


. Capacitive Loading for AC Testing 


Automatic testers and their associated hardware have stray 
capacitance that varies from one type of tester to another, 
but is generally around 50 pF. This, of course, makes it 
impossible to make direct measurements of parameters 
that call for a smaller capacitive load than the associated 
stray capacitance. Typical examples of this are the so- 
called ''float delays'' which measure the propagation 
delays into and out of the high impedance state and are 
usually specified at a load capacitance of 5.0 pF. In these 
cases, the test is performed at the higher load capacitance 
(typically 50 pF) and engineering correlations based on 
data taken with a bench set up are used to predict the 
result at the lower capacitance. 


Notes: 1. Diagram shown for Input Control! Enable-LOW and Input Control 


WFR02660 


Enable and Disable Times 


Disable-HIGH. 


2. S;, Se and Sg of Load Circuit are closed except where shown. 


Similarly, a product may be specified at more than one 
Capacitive load. Since the typical automatic tester is not 
capable of switching loads in mid-test, it is impossible to 
make measurements at both capacitances even though 
they may both be greater than the stray capacitance. In 
these cases, a measurement is made at one of the two 
capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench set up and the knowledge that certain 
DC measurements (IOH, !o_, for example) have already 
been taken and are within specification. In some cases, 
special DC tests are performed in order to facilitate this 
correlation. 


7. Threshold Testing 


The noise associated with automatic testing, the long, 
inductive cables, and the high gain of bipolar devices when 
in the vicinity of the actual device threshold, frequently give 
rise to oscillations when testing high-speed speed circuits. 
These oscillations are not indicative of a reject device, but 
instead, of an overtaxed test system. To minimize this 
problem, thresholds are tested at least once for each input 
pin. Thereafter, ''hard'' HIGH and LOW levels are used for 
other tests. Generally this means that function and AC 
testing are performed at ''hard"' input levels rather than at 
Vit Max. and Vip, Min. 


8. AC Testing 


3-69 


Occasionally, parameters are specified that cannot be 
measured directly on automatic testers because of tester 
limitations. Data input hold times often fall into this catego- 
ry. In these cases, the parameter in question is guaranteed 
by correlating these tests with other AC tests that have 
been performed. These correlations are arrived at by the 
cognizant engineer by using data from precise bench 
measurements in conjunction with the knowledge that 
certain DC parameters have already been measured and 
are within specification. 

In some cases, certain AC tests are redundant since they 


‘ can be shown to be predicted by other tests that have 


already been performed. In these cases, the redundant 
tests are not performed. . 














SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM INPUTS OUTPUTS 
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MAY CHANGE Mirae 
FROM H TOL GING 
FROMH TOL 


MAY CHANGE WILL BE 
FROML TOH CHANGING 
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DON'T CARE; CHANGING; 
ANY CHANGE STATE 
PERMITTED UNKNOWN 


CENTER 
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SWITCHING WAVEFORMS (Cont'd.) 


INPUTS* X 
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MSERR 





Propagation Delays (SLAVE = LOW) 


Inputs: PAg - PA3, PBp - PB3, DAg — DA31, DBo - DB31, Io —-!g, Wo-Wa4, Po-Ps, CP, RS, 
MCin, MLINK, M/m, BOROW, HOLD 
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SWITCHING WAVEFORMS (Cont'd.) 
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Propagation Delay (SLAVE = HIGH) 
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Enable/Disable | (SLAVE = HIGH) 
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Enable/Disable Il (OE-Y = LOW) 
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INPUT/OUTPUT CIRCUIT DIAGRAM 


DRIVING OUTPUT 


= | lot 


(All Devices) 
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Am29334. 


Four-Port Dual-Access Register File 


DISTINCTIVE CHARACTERISTICS 


e Fast 
With an access time of 24 ns, the Am29334 supports 
80-90 ns microcycle time when used with the Am29300 
Family for 32-bit systems. 
64x 18 Bits Wide Register File 
The Am29334 is a high-performance, high-speed, dual- 
access RAM with two READ ports and two WRITE 
ports. 
Cascadable 
The Am29334 is cascadable to support either wider 
word widths, deeper register files, or both. 


@ Simplified Timing Control 
Control for write enable timing and for on-chip read/ 
write address multiplexer are derived from a single- 
phase clock input. 
Byte Parity Storage 
Width of 18 bits facilitates byte parity storage for each 
port and provides consistency with the AmM29332 32-bit 
ALU. 
Byte Write Capability 
Individual byte write enables allow byte or full word 
write. 


GENERAL DESCRIPTION 


The Am29334 is a 64-word deep and 18-bit wide dual- 
access register file designed to support other members of 
the Am29300 Family by providing high-speed storage. It 
has two write and two read ports for data and four 6-bit 
address ports. Two address ports are associated with each 
pair of read and write data ports, one to read data and the 
other to write. The device is capable of performing two 
reads and two writes in one cycle. The 18-bit wide register 


file allows storage of byte parity to support parity check and 
generate in the Am29332 32-bit ALU. Independent control 
for each read and write data port allows the Am29334 to be 
used as a high-speed shared memory or as a mailbox for a 
multiprocessor system. The device is designed with an 
access time of 24 ns. It is housed in a 120-lead pin-grid- 


| array package. 


BLOCK DIAGRAM 


OUAL ACCESS 


RAM 
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of 
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RELATED AMD PRODUCTS 


[part No. | Bescription 


Am29325 32-Bit Floating Point Processor 
Am29331 16-Bit Microprogram Sequencer 
Am29332 32-Bit Extended Function ALU 


CONNECTION DIAGRAM 


DA0O DAO2 DA04 DAOS DADS DA12 


ARAQ DA03  DAOS DAO? DAO 


AWAO  DA01 GNDE DA06 ~=VCCE 


DBO4 VCCE DBO08 DBO9 DBI15 


DBO3 VCCE DBOS DBI DB12 


DB02 VCCE DB06 DBI0 DB1I4 
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CD010391 


Note: GNDT = TTL GND 
GNDE = ECL GND 
VCCT = TTL VCC 
VCCE = ECL VCC 

















PAD PIN PAD PIN PAD PAD 









































































AwaA2 
ARA3 
AWA4 
YB1 
TTL GND 
YB7 
YB8 
YB12 
TTL GND 
YB15 
WEpL 
WEgc 
AwB5 
ARA2 
AWA3 
ARA4 
YB2 
YB4 
YB6 
YBg 
YB11 
YB13 


YB16 
WEsBH 


LEp 
ARB5 
Awa 
ARA1 
YBO 
YB3 





1. Pins E-1, E-12 and E-13 are physically shorted together in the package. 
2. Pins J-11, J-12 and J-13 are physically shorted together in the package. 


TABLE OF INTERCONNECTIONS 


(Sorted by Pin No.) 


Daio 


Dpi5 
Dpi2 
Dpi4 
Dai2 
Dai3 
Dait 


DAi6 
DA15 
DA14 
ARBO 
Dpi7 
DB16 
LEa 
ARAS5 
DA17 
YAO 
YAS. 
OEA 
YA7 


YAi2 
YA14 
YA17 
AWBO 
DB13 
WEaAc 
AWA5 
ARB4 
YA1 
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ECL Vcc 


ECL GND 
ECL GND 
ECL GND 


TTL Vcc 


YA4 
YA6 
YAB 
YA14 
YA13 
YA15 
ARB3 
AWB2 


ARB1 
WEaL 


WEaH 
AwB4 
YA2 

TTL GND 
YAS 

YAQ 
YA10 
TTL GND 
YA16 
AwB3 
ARB2 
AwB1 


















































TABLE OF INTERCONNECTIONS (Cont'd.) 


(Sorted by Pin Name) 


PAD PAD PAD PAD 
NO. NO. NO. NO. 








3-77 











LOGIC SYMBOL _ METALLIZATION AND PAD LAYOUT 


Dao ~ Dayz 


Awao ha Awas 


Arao “ Aras 


m~ 
“ 
om 
ol 
™ 
~“ 
> 
“ 
a 





LS002220 
Die Size: 258 x 251 mils 
Equivalent Gate Count: 3500 





ORDERING INFORMATION 
Standard Products 


_ AMD standard products are available in several packages and operating ranges. The order number (Valid 
Combination) is formed by a combination of: a. Device Number 
b. Speed Option (if applicable) 
c. Package Type 
d. Temperature Range 
e. Optional Processing 


AM29334 G C B 


— — — 


|, OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


d. TEMPERATURE RANGE 
C = Commercial (Tc = 0 to + 85°C) 


c. PACKAGE TYPE 
G = 120-Lead Pin Grid Array with Heatsink 
(CG 120) 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29334 
Four-Port Dual-Access Register File 


Valid Combinations 






Valid Combinations 


AM29334 GC, GCB 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released valid combinations, 
and to obtain additional data on AMD's standard military 
grade products. 
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PIN DESCRIPTION 


Arao-~Aras Addresses (Inputs, Active HIGH) 
The 6-bit field presented at the ARa inputs, selects one of 
64 memory words for presentation to the Ya Data Latch. 


Arso-Arss Addresses (inputs, Active HIGH) 
The 6-bit field presented at the ARp inputs, selects one of 
64 memory words for presentation to the Yg Data Latch. 


Yao-Yaiz7 Data Latch (Outputs, Three-State) 
The 18-bit Y~a Data Latch outputs. 


Yspo-Ypiz7 Data Latch (Outputs, Three-State) 
The 18-bit Yg Data Latch outputs. 


Awao-Awas'_ Addresses (inputs, Active HIGH) 
The 6-bit field presented at the AWa inputs, selects one of 
64 words for writing new data from the Da inputs. 


Awso-Awss_ Addresses (Inputs, Active HIGH) 
The 6-bit field presented at the AWg inputs, selects one of 
64 words for writing new data from the Dg inputs. 


Dao-Dai7 Data (Inputs, Active HIGH) 
New data is written into the word, selected by the AWa 
address inputs, through these inputs. 


Dgo0-Dpiz7 Data (Inputs, Active HIGH) | 
New data is written into the word, selected by the AWp 
address inputs, through these inputs. 


LEA Ya Data Latch Enable (Input) 
The LEa input controls the latch for the Ya output port. 
When LEa is HIGH, the latch is open (transparent), and data 
from the RAM, as selected by the ARa address inputs, is 
present at the Ya outputs. When LEag is LOW, the latch is 
closed and it retains the last data read from the RAM 
selected by the ARa address inputs. 


LEs #Yp Data Latch Enable (Input) 
The LEgp input controls the latch for the Yg output port. 
When LEg is HIGH, the latch is open (transparent), and data 
from the RAM, as selected by the ARgp address inputs, is 
present at the Yp outputs. When LEp is LOW, the latch is 
closed and it retains the last data read from the RAM 
selected by the ARg address inputs. 


OE, Ya Output Enable (Input, Active LOW) 
When OE, is LOW, data in the Ya Data Latch is present at 
the Ya outputs. If OE, is HIGH, Ya outputs are in the high- 
impedance (off) state. 


OEg Yg Output Enable (Input, Active LOW) 
When OEg is LOW, data in the Yg Data Latch is present at 
the Yg outputs. If OEg is HIGH, Yg outputs are in the high- 
impedance (off) state. 


WEac_ Write Enable (Input, Active LOW) 
When WEac is LOW together with WEay and WEa,, new 
data is written into the word selected by the AWa address 
inputs. When WE,jc is HIGH, no data is written into the RAM 
through the A port. 


WEspc_ Write Enable (Input, Active LOW) 
When WEgc is LOW together with WEgy and WEg,, new 
data is written into the word selected by the AWp address 
inputs. When WEgc is HIGH, no data is written into the RAM 
through the B port. 


WEanH  High-Byte Write Enable (Input, Active LOW) 
When WEanH is LOW together with WE,c, new data is 
written into the high byte of the word selected by the AWa 
address inputs. When WEaH is HIGH, no data is written into 
the high byte of the word selected by the AWa address 
inputs. 


WEsx_—=s High-Byte Write Enable (Input, Active LOW) 
When WEBH is LOW together with WEgc, new data is 
written into the high byte of the word selected by the AWp 
address inputs. When WEpy is HIGH, no data is written into 
the high byte of the word selected by the AWp address 
inputs. | 


WEaL Low-Byte Write Enable (Input, Active LOW) 
When WEa, is LOW together with WE,c, new data is 
written into the low byte of the word selected by the AWa 
address inputs. When WE,a, is HIGH, no data is written into 
the low byte of the word selected by the AWa address 
inputs. 


WEg. Low-Byte Write Enable (Input, Active LOW) 
When WEg, is LOW together with WEgc, new data is 
written into the low byte of the word selected by the AWp 
address inputs. When WEB, is HIGH, no data is written into 
the low byte of the word selected by the AWg address 
inputs. 
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FUNCTIONAL DESCRIPTION 


The part has two read ports (Yao- Ya17, YBo- Ypi7), two 
write ports (Dag -Dai7, Dpo—-Dpi7), four addresses 
(ARao - ARAS; Awao-Awas: ARBo- ARBs; AwBo- Awss), 
two latch enables (LEa, LEg), two output enables (OE,, OEs), 
and six write enables (WEac, WEaL, WEaH, WEsc, WEBL, 
WE gp) that allow writing of data into one or both bytes of a 
word. The separate read and write addresses facilitate cre- 
ation of three- and four-address architectures and allow 
address set-up and RAM access to overlap. 





Since the A and B sides are identical, only operation of the A 
side is described. The address multiplexer provides the RAM 
with the address Ana when WEac=HIGH and with the 
address Awa when WE,~c =LOW. Internally the part is 
designed so that there is no race condition between the write 
address and the write enable. In most cases WEac and LEa 
will be connected to the clock as shown in Figure 2 so that 
reading will take place in the first part of a clock cycle and 
writing in the last part. The latch at the output of the RAM is 
transparent when LEa = HIGH and retains the data when 
LEa = LOW. The latch has a three-state output Ya controlled 
by OE,. Each word is split into two bytes of 9 bits that can be 
individually written. The low byte covers bits 0 through 8 and 
the high byte covers bits 9 through 17. One or both bytes of 
the data at Da are written into the location given by Awa when 
the common write enable (WE,c) and the appropriate byte 
write enables (WEa; and WEany) are active. Two special 
cases then arise. First, if a location is written into and read at 















Am29331 
16-BiT 
SEQUENCER 


MiICROPROGRAM 
MEMORY 







Am29325 
32-BI1T 
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the same time, the value read is the value being written. 
Second, if a location is written into from both the A side and 
the B side, the value written is undefined, but the operation is 
not harmful. 


The transparency mode during a write (WE, = LOW) allows 
the data-in (Da) to not only be written into memory but also to 
appear at the output (Ya) when the output latch (LE) is HIGH 
and the output enable control (OEa) is LOW. 


Extension To Four Read Ports and Two Write 
Ports 


A RAM with four read ports and two write ports can be made 
by using two dual access RAMs and connecting each of the 
write ports, write addresses, and write enables in parallel for 
the two devices. As an example, this RAM may provide data 
storage for a data ALU and an address adder as shown in 
Figure 3. A location should not be read before it has been 
written into for the first time as the contents of the two dual 
access RAMs are likely to be different upon power-up. 


32 Words x 36 Bits Single-Access RAM 


It is possible to convert the 64 words x 18 bits dual-access 
RAM into a 32 word x 36 bit single-access RAM. This is done 
by storing the upper half of the 36 bits in the upper half of the 
64 words and addressing these from the A side. Then store 
the lower half of the 36 bits in the lower half of the 64 words 
and address these from the B side. This arrangement, which is 
shown in Figure 4, does not change the capacity of the RAM, 
but the dual access is lost. 








334 


64 x 18 











Am29332 Am29323 
32-BIT 32 x 32 
ALU PARALLEL 






MULTIPLIER 


AF003480 


Figure 1. Am29300 Family High-Performance System Block Diagram 
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CP, WEac, LE, 


READ AND WRITE 
ADDRESS SELECTION 


WE an. WEar 
READ DATA 


WRITE OATA 


WF009520 


Figure 2. Read through Ya and Write through Da in a Single Cycle (Two Bytes) 
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Figure 3. RAM with Four Read Ports and Two Write Ports 
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Yo-Y17 


LS001790 


Figure 4. 32 x 36 RAM (Single Access) Using 64 x 18 Dual-Access RAM 


APPLICATIONS 


Suggestions for Power and Ground Pin 
Connections 


The Am29334 operates in an environment of fast signal rise 
times and substantial switching currents. Therefore, care must 
be exercised during circuit board design and layout, as with 
any high-performance component. The following is a sug- 
gested layout, but since systems vary widely in electrical 


configuration, an empirical evaluation of the intended layout is 


recommended. 


The Voct and GNDT pins, which carry output driver switching 
currents, tend to be electrically noisy. The Voce and GNDE 
pins, which supply the ECL core of the device, tend to produce 
less noise, and the circuits they supply may be adversely 
affected by noise spikes on the Vccg plane. For this reason, it 
is best to provide isolation between the Vcce and Vccrt pins, 
as well as independent decoupling for each. Isolating the 
GNDE and GNDT pins is not required. 





Printed Circuit Board Layout Suggestions 


1. Use of a multi-layer PC board with separate power, ground, - 
and signal planes is highly recommended. 


2. All Voce and Vcct pins should be connected to the Vcc 
plane. Vcct pins should be isolated from Vccge pins by means 
of a slot cut in the Voce plane; see Figure 5. By physically 
separating the Vcce and VccT pins, coupled noise will be 
reduced. 


3. All GNDE and GNDT pins should be connected directly to 
the ground plane. 


4. The VcctT pins should be decoupled to ground with a 0.1-uF 
ceramic capacitor and a 10-uF electrolytic capacitor, placed 
as closely to the Am29334 as is practical. Voce pins should 
be decoupled to ground in a similar manner. 


A suggested layout is shown in Figure 5. 


ABCDEFGHJKLMN ; 








OAOAN OO AR WH — 


@ = Through Hole 
** = Vcc Plane Connection 
C, =C,=C., =10pF or greater (electrolytic 
or tantalum capacitor) 
Co = C4 =Cg =0.1nF or greater (ceramic or 
monolithic capacitor) 
CD010900 


Figure 5. Suggested Printed Circuit Board Layout 
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Figure 6. Am29334 Thermal Characteristics (Typical) 
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‘ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 


Storage Temperature 65 to +150°C Commercial (C) Devices 

Temperature Under Bias - Tc | ~55 to + 125°C Temperature (Tc) 0 to +85°C 

Supply Voltage to Ground Potential — | Supply Voltage veeeeesee #475 to +5.25 V 
Continuous -0.5 to +7.0 V 

DC Voltage Applied to Outputs Operating ranges define those limits between which the 
for High State ...... seaeaa eanahed Gnboate -0.5 V to +Vcc Max functionality of the device is guaranteed. 


DC Input Voltage -0.5 to +5.5 V 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


DC CHARACTERISTICS over operating range 
Parameter Parameter Test Conditions 
Symbol ' Description (Note 1) 
Voc = Min. 
VOH Output HIGH Voltage Vin = Vit or Vin 2.4 Volts 
lon =-3 MA 
Voc = Min. 
VOL Output LOW Voltage Vin = Vit or Vin Volts 
lol = 16 mA 
V Input HIGH Level Guaranteed Input Logical 2.0 Volts 
IH P HIGH Voltage for All Inputs 


Guaranteed Input Logical 
IPO. LOW Eeve! LOW Voltage for All Inputs 


Voc = Min. 
Input Clamp Voltage lin =-18 mA 





L Volts 


1 — 
ro) 
4 3 3 


< 


-1.2 Volts 


Voc = Max. ; 
Input LOW Current Vin = 0.5 V 


Voc = Max. 
Input HIGH Current VIN = 2.4 V 


Vcc = Max. 

Input HIGH Current Vin = 5.5 V 

lOZH Off-State (High-Ilmpedance) = 
loZL Output Current Vec = Max. 


Output Short-Circuit Current Voc = Max. to +0.5 V 


Isc (Note 2) | Vo=0.5 V 
To =0 to +85°C 
COM'L Only : 3 
To = +85°C 


r- 


x 


Vo = 0.5 V. 


< 
Oo 
ie) 
iN 
< 


on 
io) 


| 
— 
oa 


950 


@ 
Ded 
So 


Power Supply Current 





: : ee | 820 | 
ee (Note 3) a To =-55 to + 125°C P| 
MIL Only : 
To = + 125°C aie 
Notes: 1. For conditions shown as Min. or Max., use the appropriate value specified under Operating Ranges for the applicable device 


type. . 
2. Not more than one output should be shorted at a time. Duration of the short-circuit test should not exceed one second. 
3. Measured with all inputs HIGH. 
4. Recommended air velocity is 200 linear feet per minute. 
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SWITCHING CHARACTERISTICS over operating range (Note 1) 


Active | 
: OE, or OEg t to Ya or 7 


We =| 


Data Setup Time Da or Dg to WEa or WEg t 
| 8 | Data Hold Time Da or Dg to WEa or Weg 1 
Address Setup Time Awa or Awe to WEa or WEB 1 


Address Hold Time Awa or Awp to WE or WEg 1 ae ee 
Address Setup Time Ara or Arp to LEa or LEp |! — a 
Address Hold Time Ara or App to LE, or LEp | a 


ries Close Before LE, or LEg 1 to WE, or WEg | 
rite 


Write Pulse Width WE, or WEg (LOW) 


Latch Data Capture 


Notes: 1. WEa = WEac + WEaL/H 
WEp = WEgco + WEBL/H 
2. Ya and Ygp are tested independently. 
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SWITCHING TEST CIRCUIT 


TC003420 


Three-State Outputs 


Notes: 1. CL = 50 pF includes scope probe, wiring and stray capacitances without device in test fixture. 


. 54, So, Sg are closed during functions tests and all AC tests except output enable tests. 
. S4 and Sg are closed while So is open for tpz}, test. 


S; and So are closed while Sg is open for tpz,_ test. 
. CL = 5.0 pF for output disable tests. 





SWITCHING WAVEFORMS 
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SWITCHING WAVEFORMS (Cont'd.) 


i ks 
EE COKE 


+O | 


{ERO ROTH OHOCK OO 


WF023510 


Note: LEa = HIGH 
OE, = LOW 
Transparency Function (same for B Port). 





3-87 





sn 











INPUT/OUTPUT CIRCUIT DIAGRAM 
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Am29434 


ECL Four-Port, Dual-Access Register File 


PRELIMINARY 


DISTINCTIVE CHARACTERISTICS 


Fast 

With an access time of 20 ns, the Am29434 supports 
50-60 ns microcycle time when used with the Am29400 
Family for 32-bit systems. 

64x 18 Bits Wide Register File 

The Am29434 is a high-performance, high-speed, dual- 
access RAM with two READ ports and two WRITE 
ports. 

Cascadable 

The Am29434 is cascadable to support either wider 
word widths, deeper register files, or both. 


@ Simplified Timing Control 
Control for write enable timing and for on-chip read/ 
write address multiplexer are derived from a single- 
phase clock input. 
Byte Parity Storage 
Width of 18 bits facilitates byte parity storage for each 
port and provides consistency with the Am29432 32-bit 
ALU. 
Byte Write Capability 
Individual byte-write enables allows byte or full word 
write. 


GENERAL DESCRIPTION 


The Am29434 is a 64-word deep and 18-bit wide dual- 
access register file designed to support other members of 
the Am29400 Family by providing high-speed storage. It 
has two write and two read ports for data and four 6-bit 
address ports. Two address ports are associated with each 
pair of read and write data ports, one to read data and the 
other to write. The device is capable of performing two 
reads and two writes in one cycle. The 18-bit wide register 


file allows storage of byte parity to support parity check and 
generate in the Am29432 32-bit ALU. Independent control 
for each read and write data port allows the Am29434 to be 
used as a high-speed shared memory or as a mailbox for a 
multiprocessor system. The device is designed with an 
access time of 20 ns. It is housed in a 120-lead pin grid 
array package. 
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CONNECTION DIAGRAM 
120-Lead PGA* 
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*Pinout observed from pin side of package. 





TABLE OF INTERCONNECTIONS 
(Sorted by Pin No.) 


oo PAD PAD PAD 

abe NO. NO. NO. 

- - 99 C-5 10 M-5 
C-6 
C-7 






PAD |. 
NO. 
80 


YA4 
YA6 
Yas 
YA 
YA13 
YA15 
ARB3 
AwB2 
ARB1 
WEaL 
WEaH 
AwWB4 
YA2 
Vcco 
YA5 
Yag 
YA10 
Vcco 
YA16 
AwB3 
ARB2 
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TABLE OF INTERCONNECTIONS 
(Sorted By Pin Name) 
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NO. 
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M-7 

N-7 24 
N-8 84 
M-8 


PAD PAD 
NO. NO. 
14 82 

















TABLE OF INTERCONNECTIONS 
(Sorted by Pad No.) 


PIN PAD PIN _ PAD PIN PAD PIN 
NUMBER NUMBER | NUMBER NUMBER | NUMBER NUMBER | NUMBER 
A- 31 61 B-1 91 


1 AwBi1 N-13 ARB1 M-13 
C-1 Awso L-12 ArBo K-11 
D-3 DB1i7 K-12 Dpi6 K-13 

H-11 H-13 
L-13 H-12 
G-12 G-13 



























PAD 
NUMBER 
1 


















mm 
NM 
































































= G-2 = 
J-11, J-12, J-13 H-3 E-11,E-12,£-13 
ey H-4 = 
G-11 J-3 F-11 
Je C-13 J-2 F-13 
K-3 F-12 K-2 D-11 
K-1 D-12 L-3 D-13 
L-1 C-11 L-2 C-12 
M-2 B-12 M-1 B-13 
N-1 A-13 N-2 A-12 
N-3 A-11 M-3 B-11 
L-4 C-10 M-4 B-10 
N-4 A-10 L-5 C-9 
| A-9 M-5 B-9 
A-8 M-6 B-8 
C-7 M-7 B-7 
C-8 L-8 C-6 
A-7 N-8 A-6 
B-6 L-9 C-5 
A-5 M-9 B-5 
C-4 M-10 B-4 
A-4 L-11 C-3 
A-3 M-11 B-3 
B-2 N-12 A-2 





. Vcc is the most positive power supply voltage for internal chip logic. 
. Vcco is the most positive power supply for output buffers. 

. VEE is the most negative power supply for all logic. 

. Pins E-11, E-12, and E-13 are physically shorted together in the package. 
. Pins J-11, J-12, and J-13 are physically shorted together in the package. 
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LOGIC SYMBOL 


Day — Dayz Deo — Dai7 





Die Size: 251 x 258 mils 
Equivalent gate count- 2700 gates 





LS002821 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 
formed by a combination of: A. Device Number 

B. Speed Option (if applicable) 

C. Package Type 

D. Temperature Range 

E. Optional Processing 


AM29434 = G C 


B 
_— OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


D. TEMPERATURE RANGE 
C = Commercial (0 to + 70°C) 


C. PACKAGE TYPE 
G = 120-Pin Pin Grid Array (CG 120*) 


B. SPEED OPTION 
Not Applicable 


A. DEVICE NUMBER/DESCRIPTION (include revision letter) 
Am29434 ECL Four-Port, Dual-Access Register File 


* Preliminary. Subject to Change. 


Valid Combinations 


Valid Combinations list configurations planned to be 


Valid Combinations supported in volume for this device. Consult the local AMD 
| am29434.——=~=*=“‘(]SSGG, GCB].Ct~<‘:‘C‘C*@*LS sales office to confirm availability of specific valid 


combinations, to check on newly released valid combinations, 
and to obtain additional data on AMD's standard military 
grade products. 
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PIN DESCRIPTION 


Arao~ Aras Addresses (inputs, Active HIGH) 
The 6-bit field presented at the ARa inputs selects one of 64 
memory words for presentation to the Ya Data Latch. 


Arpo- Ares Addresses (Inputs, Active HIGH) 
The six-bit field presented at the ARg inputs selects one of 
64 memory words for presentation to the Ygp Data Latch. 


Yao-Yai7 Data Latch (Outputs) 
The 18-bit Ya Data Latch Outputs. 


Yspo-Yspiz7 Data Latch (Outputs) 
The 18-bit Yp Data Latch Outputs. 


Awao-Awas_ Addresses (Inputs, Active HIGH) 
The six-bit field presented at the AWa inputs selects one of 
64 words for writing new data from the Da inputs. 


Awso-Awss Addresses (Inputs, Active HIGH) 
The six-bit field presented at the AWgs inputs selects one of 
64 words for writing new data from the Dg inputs. 


Dao-Dai7 Data (inputs, Active HIGH) 
New data is written into the word, selected by the AWa 
address inputs, through these inputs. 


Dpo-Dpi7 Data (Inputs, Active HIGH) 
New data is written into the word, selected by the AWg 
address inputs, through these inputs. 


LEA Ya Data Latch Enable (input) 
The LE, input controls the Latch for the Ya output port. 
When LEag is HIGH, the latch is open (transparent) and data 
from the RAM, as selected by the ARa address inputs, is 
present at the Ya outputs. When LE, is LOW, the Latch is 
closed and it retains the last data read from the RAM 
selected by the ARa address inputs. 


‘LEg Ygp Data Latch Enable (Input) - 
The LEg input controls the Latch for the Yg output port. 
When LEsg is HIGH, the Latch is open (transparent) and data 
from the RAM, as selected by the ARg address inputs, is 
present at the Yp outputs. When LEg is LOW, the Latch is 
closed and it retains the last data read from the RAM 
selected by the ARg address inputs. 


OE, Ya Output Enable (Input, Active LOW) 
When OE, is LOW, data in the Ya Data Latch is present at 
the Yq outputs. If OE, is HIGH, Ya outputs are in the LOW 
logic (off) state. 


OEg Yg Output Enable (Input, Active LOW) 
When OEsg is LOW, data in the Yp Data Latch is present at 
the Yp outputs. If OE is HIGH, Ypg outputs are in the LOW 
logic (off) state. 


WEaL 


WEsL 


WEac_ Write Enable (Input, Active LOW) 
When WEajc is LOW together with WEay and WEa_, new 
data is written into the word selected by the AWa address 
inputs. When WEjc is HIGH, no data is written into the RAM 
through the A port. 


WEgc-_—* Write Enable (Input, Active LOW) 
When WEgc is LOW together with WEgy and WEg:, new 
data is written into the word selected by the AWg address 
inputs. When WEgc is HIGH, no data is written into the RAM 
through the B port. 


WEan — High-Byte Write Enable (Input, Active LOW) 
When WEaH is LOW together with WEac, new data is 
written into the high byte of the word selected by the AWa 
address inputs. When WEan is HIGH, no data is written into 
the high byte of the word selected by the AWa address 
inputs. 


WEspH_ ss Hiigh-Byte Write Enable (Input, Active LOW) 
When WEgpx is LOW together with WEgc, new data is 
written into the high byte of the word selected by the AWg 
address inputs. When WEgp is HIGH, no data is written into 
the high byte of the word selected by the AWs address 
inputs. 


Low-Byte Write Enable (Input, Active LOW) 
When WE,_ is LOW together with WE,c, new data is 
written into the low byte of the word selected by the AWa 
address inputs. When WEa, is HIGH, no data is written into 
the low byte of the word selected by the AWa address 
inputs. 


Low-Byte Write Enable (Input, Active LOW) 
When WEg_ is LOW together with WEgc, new data is 
written into the low byte of the word selected by the AWpg 
address inputs. When WEp_ is HIGH, no data is written into 
the low byte of the word selected by the AWp address 
inputs. 


Vcc Internal Logic Ground 
This is the most positive voltage in the internal logic. It is 
used as the reference level for internal logic. 


Vcco Out Drive Ground 
This is the most positive voltage in the output buffer logic. It 
is used as the reference level for the buffer logic. 


Vee Power Supply Volatge 
This is the most negative voltage. It provides power for 
internal and buffer logic. 
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FUNCTIONAL DESCRIPTION 


The part has two read ports (Yao- Ya17, Ygo-Yp17), two 
write ports (Dao -Dai7, Dgo -Dpi7), four addresses 
(ARAo-ARAS; Awao-Awas, ArRBo-ArB5; AwsBo- Awss), 
two latch enables (LEa, LEg), two output enables (OE,, OEs), 
and six write enables (WEac, WEaL, WEaH, WEsc, WEBsL, 
WEgsy) that allow writing of data into one or both bytes of a 
word. The separate read and write addresses facilitate cre- 
ation of three- and four-address architectures and allow 
address set-up and RAM access to overlap. 





Since the A and B sides are identical, only operation of the A 
side is described. The address multiplexer provides the RAM 
with the address Ana when WEac=HIGH and with the 
address Awa when WE,c =LOW. Internally the part is 
designed so that there is no race condition between the write 
address and the write enable. In most cases WEac and LEa 
will be connected to the clock as shown in Figure 2 so that 
reading will take place in the first part of a clock cycle and 
writing in the last part. The latch at the output of the RAM is 
transparent when LEa = HIGH and retains the data when 
LE, = LOW. The latch has an output Ya controlled by OEa. 
Each word is split into two bytes of nine bits that can be 
individually written. The low byte covers bits 0 through 8 and 
the high byte covers bits 9 through 17. One or both bytes of 
the data at Da are written into the location given by Awa when 
the common write enable (WE,c) and the appropriate byte 
write enables (WEa, and WEaH) are active. Two special 
cases arise. First, if a location is written into and read at the 
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Figure 1. Am29400 Family High-Performance System Block Diagram 


same time, the value read is the value being written. Second, if 
a location is written into from both the A side and the B side, 
the value written is undefined, but the operation is not harmful. 


The transparency mode during a write (WE, = LOW) allows 
the data-in (Da) to not only be written into memory but also to 
appear at the output (Ya) when the output latch (LE,) is HIGH 
and the output enable control (OE) is LOW. 


Extension To Four Read Ports and Two Write 
Ports 


A RAM with four read ports and two write ports can be made 
by using two dual access RAMs and connecting each of the 
write ports, write addresses, and write enables in parallel for 
the two devices. As an example, this RAM may provide data 
storage for a data ALU and an address adder as shown in 
Figure 3. A location should not be read before it has been 
written into for the first time as the contents of the two dual 
access RAMs are likely to be different upon power-up. 


32 Words x 36 Bits Single Access Ram 


It is possible to convert the 64 word x 18-bit dual-access RAM 
into a 32 word x 36-bit single-access RAM. This is done by 
storing the upper half of the 36 bits in the upper half of the 64 
words and addressing them from the A side. The lower half of 
the 36 bits should then be stored in the lower half of the 64 
words and addressed from the B side. This arrangement, 
which is shown in Figure 4, does not change the capacity of 
the RAM, but the dual access is lost. 


64 x 18 
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' ADORESS SELECTION 


READ DATA | Va | 


WF009520 


Figure 2. Read through Ya and Write through Dag in a Single Cycle (Two Bytes) 


Yg 
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Figure 3. RAM with 4 Read Ports and 2 Write Ports 
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LS001790 
Figure 4. 32 x 36 RAM (Single Access) Using 64 x 18 Dual Access RAM 
APPLICATIONS 
Suggested Printed Circuit Board Layout 
Bottom View 
K L M 
e @ 
@ 


Connect Vcco Directly to 
Plane. 


AF004151 
Connect Vcc & Veg Directly to Plane from E-13 and J-13. 





3-97 





ABSOLUTE MAXIMUM RATINGS | OPERATING RANGES 


‘Storage Temperature ..... -65 to + 150°C Commercial (C) Devices 

Ambient Temperature with ~ Temperature 0.to +75°C 
Power Applied -55 to +125°C. Supply Voltage -5.46 V to -4.94 V 

Vee Pin Potential to GND Pin -7.0 V to +0.5 V Air Velocity 200 linear feet per minute 

Input Voltage (DC) Vege to +0.5 V 

Output Current (DC Output HIGH) ....-30 mA to +0.1 mA Operating ranges define those limits between which the 


Stresses above those listed under ABSOLUTE MAXiMum —_—‘netlonallity of the device is guaranteed. 
RATINGS may cause permanent device failure. Functionality 

at or above these limits is not implied. Exposure to absolute 

maximum ratings for extended periods may affect device 

reliability. 


DC CHARACTERISTICS (Commercial) (Notes 1 and 2) 





Parameter | — | Parameter Test Conditions Min. Typ. Max. 
Symbol Description (Note 5) Ta (Note 3) | (Note 1) | (Note 3) 
| | 


Output Voltage HIGH Perc | -s60 |_| -010 | 





VIN = Vin Max. or Vii Min. 






Ett 









< 


VOL Output Voltage LOW 


. 


iH 
IL 
He 






3 
< 


Output Voltage HIGH 






Vin = Vin. Min. or Viti 1645 





Output Voltage LOW 







aa 
Rese al 
pase | 
Mell 
ieee 
rae 
eae all 
fe er eases 
| | = 1605 
oc | 11s | +040 
oa 
eel 
eats 
Le cael 
teen al 
ase 
oe! 
ae 


Voltage HIGH for 










-810 
-720 
~ 1490 
~1475 
~1450 


Input Voltage HIGH 






Guaranteed Input Voltage LOW for 
All Inputs 





Input Voltage 


NM 
PO 
oO 


—_ 
> 
oO 


’ os 0 to 
Input Current HIGH Vin = Vin Max. ae 


V 
V 
| in —sstmputt Current LOW Vin = Vit. Min. | +25°C | 


+ aaa 
ae ae 

Power Supply Current All Inputs and Outputs Open — 
| +75°C | 


Notes: 1. Typical values are: 

Vee =-5.2 V, Voc = GND, Voco = GND 
Output Load = 50 2 and 30 pF to -2.0 V. 

2. Guaranteed with transverse air flow exceeding 200 linear F.P.M. and 2-minute warm-up period. Typical thermal resistance values of the 
package are: 
Oya (Junction-to-Ambient) = 22°C/Watt (still air) 
Oja (Junction-to-Ambient) = 7.5°C/Watt (at 200 F.P.M. air flow) 

_ 8jc (Junction-to-Case) = 5°C/Watt , 

3. These are absolute voltages with respect to device ground pin and include all overshoots due to system and/or tester noise. Do not 
attempt to test these values without suitable equipment. 


SWITCHING CHARACTERISTICS (Commercial Only) 


[Ne [ Parameters [From To | Test Conaltione | Time (ne) 
[1 [Access time | Aron Ans ‘(YaorYe + norte 
[2 [Tumon Tine | OEnor OEp=b | Yaorve i SSSCSC~C~—SC 
[a [Tumor tine | OEn or OEp=H | Yaorve=t «| SOSC—~—SSS 
[erable Tine a or t= Yaorve 


Ce [ree enw ae 


[we [Parameters [fer 


Latch close " 
WE, or WEs H TOU 


Minimum Pulse Widths 


Tne [ Parameters [ twput [ee «dime 
[4 [wits Fuse | Wey oi Weg | WGH-LOW= WIG ee 
Latch Data Capture LE, or LEg LOW - HIGH - LOW a ae 


WEa = WEac e (WEai + WEanH) **Ya and Yp Are Tested Independently 
WEg = WEgc e (WEpL + WEsBp) 












Time (ns) 








SWITCHING TEST CIRCUIT SWITCHING TEST WAVEFORM 


tr = ty = 2.5 ns TYP 
TW00053M 


TC000232 


Ry = 50 Q termination of measurement system 
Ci = 30 pF (including stray jig capacitance) 
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KEY TO SWITCHING WAVEFORMS. 


WAVEFORM INPUTS 


MUST BE 
STEADY 


MAY CHANGE 
FROM H TOL 


MAY CHANGE 
FROML TOH 


DON'T CARE; 
ANY CHANGE 


PERMITTED — 


DOES NOT 
APPLY 


OUTPUTS 


WILL BE 
STEADY 


WILL BE 
CHANGING 
FROM H TOL 


WILL BE 
CHANGING 
FROML TOH 


CHANGING; 
STATE 
UNKNOWN 


CENTER 
LINE IS HIGH 
IMPEDANCE 
“OFF” STATE 
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SWITCHING WAVEFORMS 


Vin 


YY XXX (XXX (XXX XJ 


Vi 


Vin 
CONTROL SIGNAL 
Vit 


Y d/ RXR 
SRR 50% 50% BRRRNNS 
LIRSSSSSIX SKY KK \ TRON 


NMP Nareranarerrarenernanarsrinnd 
lalallala: 
TDSYYL LYK K 


WFR02991 





WF023070 


Read Function (same for B Port) 
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SWITCHING WAVEFORMS (Cont'd.) 


CEE ns ace 


(Te 


WF023050 


Note: LE = HIGH 
OE, = LOW 
Transparency Function (same for B Port) 


CY 


(8) 


WF023060 


Write Function (same for B Port) 
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1/O CURRENT INTERFACE DIAGRAM 
INPUT CIRCUIT 


INPUT 


TO CIRCUIT 
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OUTPUT CIRCUIT 


Voc 
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OUTPUT 
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Am29325 


32-Bit Floating-Point Processor 


DISTINCTIVE CHARACTERISTICS 


@ Single VLSI device performs high-speed floating-point 
arithmetic 
— Floating-point addition, subtraction, and multiplication 
in a single clock cycle 
- Internal architecture supports sum-of-products, 
Newton-Raphson division 
@ 32-bit, three-bus flow-through architecture 
- Programmable I/O allows interface to 32- and 16-bit 
systems 


e@ IEEE and DEC formats 
- Performs conversions between formats 
- Performs integer < floating-point conversions 
Six flags indicate operation status 
Register enables eliminate clock skew 


Input and output registers can be made transparent 


independently 


GENERAL DESCRIPTION 


The Am29325 is a high-speed floating-point processor unit. 
It performs 32-bit single-precision floating-point addition, 
subtraction, and multiplication operations in a single VLSI 
Circuit, using the format specified by the proposed IEEE 
floating-point standard, P754. The DEC single-precision 
floating-point format is also supported. Operations for 
conversion between 32-bit integer format and floating-point 
format are available, as are operations for converting 


between the IEEE and DEC floating-point formats. Any 


operation can be performed in a single clock cycle. Six 
flags — invalid operation, inexact result, zero, not-a-num- 
ber, overflow, and underflow — monitor the status of opera- 
tions. 


The Am29325 has a three-bus, 32-bit architecture, with two 
input buses and one output bus. This configuration provides 


high I/O bandwidth, allows access to all buses and affords 
a high degree of flexibility when connecting this device ina 
system. All buses are registered with each register having a 
clock enable. Input and output registers may be made 
transparent independently. Two other I/O configurations, a 
32-bit, two-bus architecture and a 16-bit, three-bus archi- 
tecture, are user-selectable, easing interface with a wide 
variety of systems. Thirty-two-bit internal feedforward data- 
paths support accumulation operations, including sum-of- 
products and Newton-Raphson division. 


Fabricated with the high-speed ImMOXx!™M bipolar process, 
the Am29325 is powered by a single 5-volt supply. The 
device is housed in a 145-terminal pin-grid-array package. 
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Publication # Rev. 
05621 D /0 
issue Date: April 1987 
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16-Bit Bounds Checker 


DISTINCTIVE CHARACTERISTICS 


© Double Comparator 
~- Compares a 16-bit input number with a lower limit and 
an upper limit 
e Cascadable 
~ 16-bit cascadable to longer words 


© Out-of-Bounds Flag 
- Flags values that are outside the bounds of a lower 
and an upper limit 
@ Compares Signed or Unsigned Numbers 
@ 28-Pin Packages 


LZeesculy 


GENERAL DESCRIPTION 


The Am29337 is the 16-bit bounds checker that compares 


a 16-bit signed or unsigned number with a lower and an 


upper limit stored in the registers. The part flags values that 


are out of bounds, or triggers a counter used to count the 
number of values that lie within the given range. 


The Am29337 is cascadable up to 32 bits or greater. 


BLOCK DIAGRAM 


“COMPARATOR 
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COMPARATOR 
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Publication # Rey. Amendment 
08546 B /0 
Issue Date: June 1987 


RELATED AMD PRODUCTS : 


Description ls 
Bipolar Bit-Slice Family j \ 





Am29117 Bipolar 16-Bit Two-Port Microprogrammable Controller 
Am29C117 CMOS 16-Bit Two-Port Microprogrammable Controller 





Bipolar 32-Bit Non-Cascadable ALU F 
: 
Bipolar 64 x 18 Four-Port Dual-Access Register File : 

| 


Am29C116 CMOS 16-Bit Microprogrammable Controller : 


Am29C334 CMOS 64x18 Four-Port Dual-Access Register File 


CONNECTION DIAGRAM 
Top View 


28 {__] SIGNED 


Ci. 
CD010100 


Note: Pin 1 is marked for orientation. 


LOGIC SYMBOL 


LS002810 


Die Size: 117 x 143 
Gate Count: 250 





3-105 





ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is formed by 
a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29337 D S 


B 
|, OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


d. TEMPERATURE RANGE 
C =Commercial (0 to + 70°C) 


c. PACKAGE TYPE 
D = 28-Pin Sidebrazed Ceramic DIP (SD4028) 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29337 
16-Bit Bounds Checker 


Valid Combinations 


—— ‘Valid Combinations Valid Combinations list configurations planned to be 


supported in volume for this device. Consult the local AMD 


AM29337 DC, DCB, sales office to confirm availability of specific valid 


combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 
products. 
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ORDERING INFORMATION (Cont'd.) 
APL Products 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Device Class 

d. Package Type 

e. Lead Finish 


AM29337 /B 


fo4 
a LEAD FINISH 


C = Gold 


d. PACKAGE TYPE (per 09-000) 
X = 28-Pin (400 mil) Sidebrazed Ceramic Dip 
(SD4028) 


c. DEVICE CLASS 
/B = Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29337 
16-Bit Bounds Checker 


Valid Combinations 


3 Tae Valid Combinations list configurations planned to be 
Eove compmatons supported in volume for this device. Consult the local AMD 


AM29337 /BXC sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. 


Group A Tests 


Group A tests consist of Subgroups 
1, 2, 3, 7, 8, 9, 10, 11. 


3-107 











DESCRIPTION 


Ci, Cly Carry-In (Inputs) © 
Carry input for cascading. 


COL, COy Carry Out (Outputs) 


Carry outputs for the result of comparison. 


CP System Clock (input) 
Clocks limit registers at the LOW-to-HIGH transition. 


Do-Di5 Data Input (Input) 
Input to the comparators and limit registers. 


FUNCTIONAL DESCRIPTION 


The Am29337 is a high-speed bounds checker that deter- 
mines if a 16-bit number lies within a lower and an upper limit. 
It consists of two comparators and two limit registers, as 
shown in the Block Diagram. 


Limit Registers, Double Comparator 


The Am29337 has a lower limit register and an upper limit 
register. The values of these two registers are loaded from the 
D-bus with the load enable inputs EN, and ENy on the clock's 
rising edge. The values of the data present on the D-bus are 
compared with the values stored in the limit registers through 
the two comparators. The comparators operate on signed 
numbers when SIGNED is HIGH and on unsigned numbers 
when it is LOW. The results of the comparisons are given by 
the outputs CO_, COy, and OOB. The definitions of carry 
inputs Cl, and Cly are given in Table 1, and the combination 
of the different regions in Table 2. If the data being compared 
is out of the region, the out-of-bounds flag, OOB, which is 
defined as CO_*COy, is set. 


EN,, ENy Load Enable (Inputs) 
Loads enables for the limit registers. 


OOB Out-of-Bounds Flag (Output) 
Flags values that are out of bounds. Defined as CO, + COu. 


SIGNED — Sign Input (Input) 
Selects signed comparisons when HIGH and unsigned 
comparisons when LOW. 





Cascading 


Comparison of numbers longer than 16 bits requires cascad- 
ing of two or more bounds-checker slices. Figure 1 shows an 
example of this for a 32-bit bounds checker. The comparison 
starts from the least significant slice. COL, COy, and OOB of 
the most significant slice act as outputs of the overall bounds 
checker, while CO, and COy of the least significant slice are 
connected to Cl, and Cly of the most significant slice. Cl and 
Cly of the least significant slice act as inputs to the overall 
bounds checker. The SIGNED input of the most significant 
slice identifies the value when being compared with either 
signed or unsigned number when the SIGNED input of the 
least significant slice is tied LOW. 


The comparison can start from the most significant slice. In 
this case, CO_, COy, OOB of the least significant slice act as 
outputs of the overall bounds checker, while CO, and COy of 
the most significant slice are connected to Cl, and Cly of the 
least significant slice. 


TABLE 1. DEFINITION OF CO, AND COy 


Outputs 


ea a 
Deu 


Note: 

D = Data Input 
L = Lower Unit 
U = Upper Unit 


TABLE 2. DIFFERENT COMBINATIONS OF REGIONS 


[inputs [Output 


Impossible 
Combination 


4 —— 
Combination 


Ce 


— 
Combination 


Pepe 


SIGNED 


AF004531 


Figure 1. 32-Bit Bounds Checker 
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ABSOLUTE MAXIMUM RATINGS | OPERATING | RANGES 


Storage Temperature -65 to +150°C Commercial (C) Devices 
Temperature Under Bias — To -55 to +125°C Temperature (Ta) 0 to +70°C 
Supply Voltage to Ground Supply Voltage (Vcc) + 4.75 to +5.25 V 
Potential Continuous -0.5 to +7.0 V 
DC Voltage Applied to Outputs 
for HIGH State -0.5 V to Vcc Max. 
DC Input Voltage -~0.5 to +5.5 V 
DC Output Current, into Outputs Operating ranges define those limits between which the 
DC Input Current : functionality of the device is guaranteed. 


Stresses above those listed under ABSOLUTE MAXIMUM Thermal Resistance (Preliminary) -SD4028 
RATINGS may cause permanent device failure. Functionality Oja = 40°C/W 

at or above these limits is not implied. Exposure to absolute Ojo = 15°C/W 

maximum ratings for extended periods may affect device 

reliability. 


Military (M) Devices 
Temperature (Tc) -55 to +125°C 
Supply Voltage (Vcc) +4.5 to +55 V 


DC CHARACTERISTICS over operating range unless otherwise specified (for APL Products, Group A, 
Subgroups 1, 2, 3 are tested unless otherwise noted) 


Parameter Parameter 
Symbol Description Test Conditions (Note 1) 
Voc = Min., Vin = Vit or Vin 
Output HIGH Voltage lon =-1.0 mA 
Voc = Min., Vin = Vit or Vin 
Output LOW Voltage lo, = 8.0 mA 
Guaranteed Input Logical 
mE CH Eeve) HIGH Voltage for All Inputs 
Guaranteed Input Logical 
ee BOW eewe LOW Voltage for All Inputs 
input Clamp Voltage Voc = Min., lin =-18 MA 
Input LOW Current Voc = Max., Vin = 0.5 V 


Input HIGH Current Vcc = Max., Vin = 2.4 V 
Input HIGH Current Voc = Max., Vin = 5.5 V 


(High Impedance) Vcc = Max. 
OZL Output Current Vo = 0.4 V 


Output Short-Circuit 
Isc Current (Note 2) VOC y MOK NOM OY 





-0.5 mA 


oy 
ro) 
= 

> 


oO 


o I 
pens md, 
nh 


= 
> 


N 
ac 





| 
N 
or 


3 
> 


Ta = +25°C 180 
Ta =0 to +70°C 230 


Icc Power Supply Current = : Ta = + 70°C 
To =-55 to 125°C 


To = 125°C 215 


Notes: 1. For conditions as Min. or Max., use the appropriate value specified under Operating Ranges for the applicable device type. 
2. Not more than one output should be shorted at a time. Duration of the short-circuit test should not exceed one second. 


Nh 
ro) 
oO 


Weel 
no 
ro) 
3 
> 


3 
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SWITCHING CHARACTERISTICS over operating range unless otherwise specified (for APL Products, 


Subgroups 9, 10, 11 are tested unless otherwise noted) 
2 [ro [Ou cy oo, c,008SS=~dC‘ON’ Cw ne 
Ps | ws | Sianed w 00, Coy,008——SOS—=~<C—~—~sSC‘aSCdY SCT 
4 | to [ore co coyoosSSC™~—<—~sSC(aTS 
SD poate 
su jee 
ons | 
pons 
| ns | 






[tc0 | 0o= Dis Setup Tene win Reset cPT| ef 
[8 [ts EN ENy Soup Tie With Revers w OP T[ ve |e 
[7 [wo [00-Dis tos time ——SSCSC~‘iSC‘i dT 
[ef ENG EN Hows Time——OCSC=C“‘“*S*S*S*rSC“‘ S*‘dSC‘ 
a 2 
Se a 


2 
2 
2 
2 
2 


1 
1 
1 
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SWITCHING TEST CIRCUIT 


TCRO1240 


5.0 -VpeE- VoL 


lOH 


lo. + VoL 
Re 


Normal Outputs 


Notes: 1. C. = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S; is closed during function tests and all AC tests except output enable tests. 


3. GC, = 5.0 pF for output disable tests. 





SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM {INPUTS 


MUST BE 
STEADY 


MAY CHANGE 
FROMH TOL 


MAY CHANGE 
FROM L TOH 


DON'T CARE; 
ANY CHANGE 
PERMITTED 


DOES NOT 
APPLY 





OUTPUTS 


WILL BE 
STEADY 


WILL BE 
CHANGING 
FROMH TOL 


WILL BE 
CHANGING 
FROML TOH 


CHANGING; 
STATE 


UNKNOWN 


CENTER 
LINE {S$ HIGH 
IMPEDANCE 
“OFF” STATE 


KS000010 


SWITCHING WAVEFORMS (Cont'd.) 


C1) 
SESS —— 
everett 
D.-D 55.2555 05055 
eeetatetatatetete’, 


eae ea 


CO, ,CO,,,008B 


WF023030 


Propagation Delays from Data Input to Output 


D 


OOO 0OO. LECOCOS 
escetetacatet? Peaeereracete 
Setetetetety ereteterete 


nT” 


CO, ,CO,,,008B 
WF023040 


Loading the Limit Registers 
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_ INPUT/OUTPUT CIRCUIT DIAGRAM 


DRIVING OUTPUT 
lon 
e@ 
ae fot 
\7 


C; * 5.0 pF, All inputs 


he 
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DRIVEN INPUT 


he 


iCR00480 


Co © 5.0 pF, all outputs 


Am29338 


32-Bit Byte Queue 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


@ Intelligent FIFO Array 
— Array of four intelligent FIFO buffers, each 9 bits wide, 
32 bits deep (RAM-based) 
@ Queuing/Dequeuing 
— Allows variable width queuing/dequeuing in one cycle 
e@ Byte Rotation 
~ Four bytes can be rotated at the input as well as at the 
output of the Byte Queue. This allows interfacing 
between incompatible byte assignments. 


@ Asynchronous and Synchronous Operation 

- Supports communication between systems with differ- 
ent clocks and different bus widths 

Retransmit 

— Data can be read out repeatedly 

Horizontal Cascading 

- Up to four devices allow simultaneous input or output 
up to 16 bytes 

Parity Check 

- Protects data at the input and the output 


GENERAL DESCRIPTION 


The Am29338 is an intelligent FIFO that allows up to four 
bytes to be queued and up to four bytes to be dequeued in 
a single cycle. When four devices are cascaded horizontal- 
ly, up to sixteen bytes can be dequeued in a single cycle. 


The Am29338 queues variable-length data by disassem- 
bling the input data, which is aligned on the least-significant 
byte of the input bus (D), into individual bytes. These bytes 
are packed internally in FIFO (first-in, first-out) order. The 
data to be dequeued is unpacked and realigned to the 
least-significant byte of the output bus (Y). Queuing and 
dequeuing can be performed simultaneously. With the 


retransmit capability, the part can repeatedly send the 
block of data stored in the queue without having to requeue 
it. This is useful for retransmitting a block of data upon 
receipt of an error in 1/O applications or for loop-locking in 
instruction-prefetch applications. 


The queue operates in synchronous or asynchronous 
mode, and is useful as an instruction-prefetch queue or as 
a general-purpose FIFO buffer. 


The device is manufactured in AMD's bipolar IMOX* 
technology and comes in a 120-lead pin-grid-array pack- 
age. 


BLOCK DIAGRAM 


D P 


Byte Rotator 


32 X 9 Memory and Slice Logic 
32 X 9 Memory and Slice Logic 


Byte Rotator 


Y 


This document contains information on a product under development at Advanced Micro Devices, 
inc. The information is intended to help you to evaluate this product. AMD reserves 
the right to change or discontinue work on this proposed product without notice. 


D 


32 X 9 Memory and Slice Logic 


0-31 PYo_3 


Full 
Almost Full 


Bytes in Queue 


EMPTY 
A-EMPTY 


Empty 
Almost Empty 





32 X 9 Memory and Slice Logic 


BD007490 
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CONNECTION DIAGRAM 
Bottom View 


A 8 Cc D E F G H J K 
1 Y21 GNOT PY3 Y27 Y28  VCCT CNT2 


2 Y20 Y23  Y24 Y26 Y29 Y31  CNT1 


3 Y19 Y22 VCCE Y25 GNDE Y30 CNTO 


10 
11 PDO D2 \VCCE D6 D7) D1i2 GNDE 015 
12 POSO O1 VCCE 03 D8 D9 + GNDE Di4 


13 . PD1 Di 


Legend: GNDE: GND, ECL 
GNDT: GND, TTL 
VCCE: Voc, ECL 
VCCT: Voc, TTL 
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PIN DESIGNATIONS 





(Sorted by Pin Number) 
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PIN DESIGNATIONS 
(Sorted by Pin Name) 
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LOGIC SYMBOL 


QCLK 
DQCLK 
QEN 

DQEN 

BQ, -BQ, 
BDQ p -8D0, 
RESET 
RXMIT 
POS y -POS, 
BSW, -BSW, 
OE 


FULL 

EMPTY 

A-FULL 
A-EMPTY }— 
CNT, -cNT, EZ > 


LS002851 


METALLIZATION AND PAD LAYOUT 


ECL GND 


Ga 4—EMPTY 

at FULL 

| TTL Vee 
A— FULL 


g—— PER 
PYERR 


Die Size: 270 x 290 mils@ 
Gate Count: 9000 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is formed by 
a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29338 G Cc B 


| OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


d. TEMPERATURE RANGE 
C = Commercial (0 to + 85°C) 


c. PACKAGE TYPE 
G = 120-Lead Pin Grid Array with Heatsink 
(CG 120) 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29338 
Byte Queue 


Valid Combinations 


; Hae Valid Combinations list configurations planned to be 

Valid mbinations : 
supported in volume for this device. Consult the local AMD 
AM29338 GC,_GCB sales office to confirm availability of specific valid 


combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 
products. 
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PIN DESCRIPTION 


-A-EMPTY Almost Empty (Output; Active HIGH) 
Indicates that there are less than four bytes of data in the 
queue. It is used in either synchronous or asynchronous 
operation. 

A-FULL Almost Full (Output; Active HIGH) 

Indicates that-there are less than four bytes of space 
remaining. It is used in either synchronous or asynchronous 
operation. 

BDQ9-BDQ3 Bytes Dequeued (Input) 

Selects the number of bytes to be dequeued (see Table 2). 
The byte queue must operate synchronously to be able to 
dequeue more than four bytes in a single cycle. 


BQ )-BQ; Bytes Queued (Input) 

Selects the number of bytes to be queued (see Table 1). 
BSWo-BSW; Byte Swap (Input) 

Allows the bytes on the input to be reordered (see Table 3). 
CNTg~CNTg Byte Count (Output) 


Gives the current number of bytes in the queue. These are 


used only in synchronous operation. 


Do-D3;1 Data Input (Input) 
Data inputs to be queued. 


DQCLK Dequeue Clock (Input) 
Dequeues the number of bytes set up on the Y bus. A LOW- 
to-HIGH transition on this input adjusts the internal dequeue 
pointers by the number set up on the BDQ lines. 


DQEN Dequeue Enable (Input; Active LOW) 
While DQEN is LOW, dequeuing is performed normally. 
When DQEN is HIGH, DQCLK is disabled. 


EMPTY Empty (Output; Active HIGH) | 
Indicates that the queue is empty. It is used in either 
synchronous or asynchronous operation. 


FULL Full (Output; Active HIGH) 
Indicates that the queue is full. It is used in either 
synchronous or asynchronous operation. 


OE Output Enable (input; Active LOW) 
When OE is LOW, the four bytes following the current 
dequeue pointer and the corresponding parity bits are on Y 
and PY outputs. When OE is HIGH, Y and PY outputs are 
three stated. 


PDg-PD3 Data Input Parity (Input) 
The input parity bits for the corresponding byte on the D 
_inputs. Only the bytes to be queued and the corresponding 


PYo-PY3 


_PD lines are checked for possible parity error. The byte 
queue has the even parity. 


PDERR Data Input Parity Error (Output; Active 
HIGH) 
If any of the bytes to be queued have a parity error, PDERR 
is asserted. 


POSp-POS; Position (input) 
These inputs are used to program the location of each byte 
queue in horizontally cascaded system upon RESET (see 
Table 4). 


Output Data Parity (Output; Three State) 
The output parity bits for Y outputs. When OE is HIGH, the 
parity bits of the four bytes following the dequeue pointer 
appear on these outputs. The byte queue has the even 
parity. . 


PYERR YY Output Parity Error (Output; Active HIGH) 
lf any of the bytes on the output has a parity error, PYERR is 
asserted. 


QCLK Queue Clock (Input) 
When QCLK is LOW, the number of bytes set up on the BQ 
lines are written into the next free space in the queue from 
the data set up on the D inputs. On a LOW-to-HIGH 
transition of this input, the internal queue pointers are 
updated. If QEN is HIGH, QCLK has no effect. 


QEN Queue Enable (Input; Active LOW) 
When QEN is LOW, queuing is performed normally. When 
QEN is HIGH, QCLK is disabled. 


RESET Reset (Input; Active LOW) 
When RESET is LOW, both the internal queue pointer and 
the internal dequeue pointer are reset to the first RAM 
location and both EMPTY and A_EMPTY are asserted. 


RXMIT Retransmit (input; Active LOW) 
When RXMIT is LOW, the internal dequeue pointers are 
reset to the first RAM location while the internal queue 
pointers remain unchanged. This allows the data contained 
between the current queue pointer and the first RAM 
location to become available for dequeuing again. The 
effect of asserting RXMIT is defined only if 128 bytes or less 
have been queued since the last assertion of RESET (see 
Figure 5). 

Yo-Y31 Data Output (Output; Three State) 
The four bytes following the current dequeue pointer appear 
on these outputs when OE is LOW. When OE is HIGH, they 
are three stated. 
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FUNCTIONAL DESCRIPTION 
Architecture 





The Am29338 is a 32-bit high-performance general-purpose 
intelligent FIFO that stores up to 128 bytes in the internal RAM 
slices and queues or dequeues up to four bytes in a single 
cycle. The byte queue is divided into five functional blocks: 1) 
four memory-slice logics, 2) byte rotators for input and output 
buses, 3) rotate-enable logic, 4) byte-count logic, and 5) full/ 
empty-generate logic. The byte-oriented parity checking is 
provided on both the D-input bus and the Y-output bus. Figure 
1 shows a detailed block diagram of the byte queue. 


Memory-Slice Logic 


Figure 2 shows a detail of the memory-slice logic. It consists of 
a 32x9 RAM, queue and dequeue pointers, adders for the 
pointers, and a full/empty detector. The RAM has indepen- 
dent 9-bit read and write ports. Both ports are accessible 
simultaneously if different RAM locations are operated on. A 
parity bit is stored along with its corresponding byte into the 
RAM. 


The queue and dequeue pointers point to the next location 
available for dequeuing. The next locations are produced by 
the internal adders with BQo— 1 or BDQg~—3 and the current 
pointer values. When RESET is asserted, both pointers are set 
to zero and the RAM is flushed. These pointers are also used 
to indicate that the RAM is either empty or full for each 
memory slice. The slice-empty or slice-full signal is used to 
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Rotate 
Logic 

















combinationally form FULL, A-FULL, EMPTY, and A-EMPTY 
signals. 


Byte Rotator 


There are two byte rotators in the byte queue. Each accepts 
36-bit wide data and performs rotation of bytes according to 
the 2-bit rotate values fed from the rotate-enable logic. The 
input byte rotator realigns and stores the bytes to be queued 
into the next free slice location. The output byte rotator 
realigns the bytes to be dequeued to the least significant byte 
of the Y-output bus. 


Rotate-Enable Logic 


The queue and dequeue rotate-enable logic keeps track of 
which slice holds the first byte of the next queue/dequeue 
operation. A modulo-4 counter is used to rotate the data in 
operation and enables the correct slices by the number of 
bytes specified by either BQo_; or BDQo-3. 


The queue rotate-enable logic also performs byte and/or word 
swaps on the incoming data. The input bytes are swapped in 
one of four ways, according to Table 3, with BSWo _ 1 and the 
current modulo-4 byte count through the input byte rotator. 


Byte-Count Logic 


This logic consists of a queue count register and a dequeue 
count register. The registers are incremented during a queue/ 
dequeue operation by the number of bytes in the operation. 
The combinational subtract logic outside of these registers 
determines the number of bytes stored in the byte queue. 
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Figure 1. Am29338 Byte Queue Detailed Block Diagram 
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Figure 2. Memory and Slice Logic 
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Figure 3. Position Line Values in Horizontally Cascaded System 
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MOST LEAST 
SIGNIFICANT SIGNIFICANT 
CHIP CHIP 


Pointer Pointer 


Dequeue 
Pointer 


BD006930 


a) Before AXMIT b) After RXMIT 
COUNT = 8 COUNT =16 


Queue 
Pointer ~ 


Deq 
Pointer 


Previously ueued 
Data i. 


TB001131 


Figure 5. Retransmit Function with the Am29338 


MSB 
a) Before First Queue Operation b) Before Second Queue Operation c) After Second Queue Operation 


TBO001141 


Figure 6. Queuing with the Am29338 


Notes: 1. Each of the four segments stands for a memory size; MSB.= Most-Significant Byte, and 
LSB = Least-Significant Byte. 
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a) After First Dequeue Opera b) Belore Second Dequeue Operation c) Alter Second Dequeue Operation 
TB001120 


Figure 7. Dequeuing with the Am29338 


Notes: 1. Each of the four segments stands for a memory size; MSB = Most-Significant Byte, and 
LSB = Least-Significant Byte. 


2. First, one byte is dequeued (‘A’), followed by a dequeue of two bytes (‘CB’). 


TABLE 1. SELECTING THE NUMBER OF BYTES TO BE QUEUVED 
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(ee a ee 
Key: L=LOW 
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TABLE 2. SELECTING THE NUMBER OF BYTES TO BE DEQUEUED 
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Key: L=LOW 
H= HIGH 


* This is possible when four of the byte queues are cascaded together. The byte queue must be operated 
synchronously to select more than four bytes for dequeuing. 
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TABLE 3. ENCODING OF BSW INPUTS 


Note: ''0'' stands for the least significant chip and ''3'' the most significant chip. 


Operational Modes 
General Operation 


To enter data into the Am29338, the number of bytes to be 
queued is set up on the Bytes Queued (BQ) pins; the 
corresponding data to be queued is set up on the Data Input 
(D) and Data Input Parity (PD) pins, aligned to the least- 
significant byte. If Queue Enable (QEN) is asserted, the data is 
entered into the Am29338 while the Queue Clock (QCLK) is 
LOW, and the internal queue pointers are updated on the 
LOW-to-HIGH transition of QCLK. 


Figure 6 shows an example of two bytes being queued, 
followed by three bytes being queued. Data is packed in the 
Am29338 so that no holes exist. 


lf Output Enable (OE) is asserted, the first four bytes available 
for dequeuing and their corresponding parity appear on the 
Data Output (Y) and Data Parity (PY) pins. The number of 
these bytes to be dequeued is set up on the Bytes Dequeued 
(BDQ) pins. If Dequeue Enable (DQEN) is asserted, the LOW- 
to-HIGH transition of Dequeue Clock (DQCLK) updates the 
internal dequeue pointers, removing the dequeued bytes. 


Figure 7 shows an example of one byte dequeued, followed by 
a dequeue of two bytes. The data to be dequeued next is 
least-significant-byte aligned on the output bus. 


Synchronous Mode 


Both synchronous and asynchronous operations are available 
for the byte queue. During synchronous operation, both QCLK 
and DQCLK must be asserted on the edge of a common clock 
within certain skew limits. The following signals can be used 
as valid status outputs for this mode: FULL, A-FULL, EMPTY, 
A-EMPTY, and CNTo — 6. Refer to the applications section for 
an example. 


Asynchronous Mode 


During asynchronous operation, QCLK and DQCLK clocks 
may be different. It is possible to execute queue and dequeue 
operations simultaneously if different locations are accessed. 
In this mode, CNT outputs are not guaranteed as valid and 
horizontal cascading is not possible. Refer to the applications 
section for an example. 


Horizontal Cascading 


In synchronous operation, four byte queues can be horizontal- 
ly cascaded together. In this case, each of the four byte 
queues hold the same data and up to sixteen bytes may be 
dequeued in a single cycle, as shown in Table 2, and Figures 3 
and 4. Each part has to be programmed with its position by the 
POS inputs, as shown in Table 4. In a normal operation, the 
internal dequeue pointer of each part is displaced according to 
the POS inputs. When RESET or RXMIT is asserted, the 
dequeue pointers are offset by the value programmed on the 
POS inputs. 


Horizontal cascading is useful in instruction buffers designed 
for systems with large, variable instructions that can span 
many bytes. 


APPLICATIONS 


Using Am29338 as an Instruction-Prefetch 
Queue 


Figure 8 shows the Am29338 used as an instruction-prefetch 
queue. Sequential 32-bit memory locations are fetched by the 
Instruction Fetch Unit (IFU) and are queued up in the byte 
queue. When the central processor needs the next instruction, 
it looks at the next four bytes from the byte queue. The central 
processor then determines the instruction length from the 
opcode and updates the dequeue pointer in the byte queue by 
setting up the instruction length on the BDQ lines and 
asserting DQCLK. When a jump occurs, the IFU flushes the 
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queue by asserting the RESET input and begins from the new 
address. For this application, the byte queue must be in 
synchronous mode. 





Using the RXMIT input, the byte queue can resend the block 
data through dequeuing rather than having to requeue it. This 
is useful for locking the loops into the byte queue and allows 
the processor to run faster than if it had to refetch instructions 
from memory or cache. Figure 9 illustrates how a loop can 
execute directly out of the byte queue. 










Using Am29338 as a Hardware Mailbox in 
Multiprocessing System 






A mailbox is a communication device between loosely coupled 
processes in a multi-programming system. Messages from 
one process to another are queued in the mailbox on a first-in, 
first-out (FIFO) basis. In a multiprocessing system, hardware 
mailboxes are required. This can be implemented using the 
Am29338 as shown in Figure 10. 








When a process wishes to send a message to the mailbox, it 
calls a special operating-system routine. This routine first 





Address Bus 

















test: cmp X 50 
bit body 


Note: This describes a block of macro instructions. 








Instruction 
Fetch 


Figure 8. Instruction-Prefetch Queue 


Branch Succeeds: 
reading the loop from the beginning of 
byte queue again 


Branch Fails: 
proceeds with the following 


| _prefetched instructions = 


+ Queue Pointer 


reads the status of the mailbox; if it is not FULL, the routine 
first writes the message to the mailbox and returns to the 
calling process. If the mailbox is FULL, the operating system 
blocks the calling process on a special queue and enables 
interrupts from the mailbox. When a slot becomes available in 
the mailbox, the sending processor is interrupted. The inter- 
rupt routine sends the message to the mailbox, disables 
interrupts from the mailbox, and unblocks the blocked pro- 
cess. On the receiving side, the EMPTY status of the mailbox 
must be available to the receiving processor in order to allow 
the receiving process to be blocked if the mailbox is empty. 
When a mailbox slot becomes filled, a blocked process must 
be awakened by interrupting the receiving processor. 


The mailbox can be extended to operate in a heterogeneous 
multiprocessing system. In this type of system, processors 
with varying data-path widths and clock frequencies are 
interconnected. For example, a 32-bit main processor may 
control 8- to 16-bit coprocessors. The ability of the Am29338 
to match data-path widths and to queue and dequeue asyn- 
chronously allows processors of different widths and clock 
rates to communicate. 







Central 
Processing 
Unit 


BD006940 


RXMIT starts Dequeue 


Pointer 






Execution 


LD001330 


Figure 9. Loop Locking Using Am29338 
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Chip Select 
Read/Write 


Controi/Data 
Interrupt Req 


Reset 


Processor 2 Clock 


Processor 2 Data 


BQ). 
POSo.4 


Processor 1 Data 


Processor 1 Clock 


Chip Select 
- Read/Write 
Control/Data 


interrupt Req 


Reset 


BDO006950 


Figure 10. Implementation of a Hardware Mailbox 


Suggestions for Power and Ground Pin 
Connections 


The Am29338 operates in an environment of fast signal rise 
times and substantial switching currents. Therefore, care must 
be exercised during circuit board design and layout, as with 
any high-performance component. The following is a sug- 
gested layout, but since systems vary widely in electrical 
configuration, an empirical evaluation of the intended layout is 
recommended. 


The Vcct and GNDT pins, which carry output driver switching 
currents, tend to be electrically noisy. The Vcce and GNDE 
pins, which supply the ECL core of the device, tend to produce 
less noise, and the circuits they supply may be adversely 
affected by noise spikes on the Vccg plane. For this reason, it 
is best to provide isolation between the Voce and VccT pins, 
as well as independent decoupling for each. Isolating the 
GNDE and GNDT pins is not required. 


a 
w= 


Printed Circuit-Board Layout Suggestions 


1. Use of a multi-layer PC board with separate power, ground, 
and signal planes is highly recommended. 


2. All Voce and Vcct pins should be connected to the Vcc 
plane. Vcct pins should be isolated from VcceE pins by means 
of a slot cut in the Voce plane; see Figure 11. By physically 
separating the Vcce and VccT pins, coupled noise will be 
reduced. 


3. All GNDE and GNDT pins should be connected directly to 
the ground plane. 


4. The Vccrt pins should be decoupled to ground with a 0.1-uF 
ceramic capacitor and a 10-uF electrolytic capacitor, placed 
as closely to the Am29338 as is practical. Voce pins should 
be decoupled to ground in a similar manner. 


A suggested layout is shown in Figure 11. 


ON On OM = 


Isolation Cut 


Through Hole 
Voc Plane Connection 


C, =C,=C, = 10uF 
Co= Cy = Cg = 0.1pF 


CD010890 


Figure 11. Suggested Printed Circuit-Board Layout 
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ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 





Storage Temperature ...........ccccccccceeee eens -65 to + 150°C Commercial (C) Devices 

Case Temperature Case Temperature (Tc)......... oa ieatbanatens 0 to +85°C 
with Power Applied .............cccccceseeeees -~55 to +125°C Supply Voltage (VCC) .....ceeeeeeeeeeees +4,75 to +5.25 V 

Supply Voltage | | Oia ten is Sehibieaata aaane eee ea recaes (under 200 ifm) 
with Respect to Ground ................0008, -0.5 to +7.0 V 


Operating ranges define those limits between which the 















DC Voltage Applied to Outputs ie oem 
for HIGH State......................... -0.5 V to +Voc Max. functionality of the device is es 
DC Input Voltage... cece ee eee e ee -0.5 V to +5.5 V 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


VIN = VIL of Vin ee 
loH=-3 mA 





Voc = Min. al ae 
VIN = VIL 4 
OL 18% a he 






ra yen ‘ Input aes 








va a mN 








se NSS RL 
| Pomers | |_| -0s | 


Voc = Max. 
Vin = 2.4 V 
Voc = Max. 10 
Vin = 5.5 V 





Output Currant A 
Output Short iCigsuit Current Voc = Max. to +0.5 V 
Isc (Note 3) vo" 05 V Sie me 
Power Supply Current = Max. Te = 0 to +85°C | | 800s | 90 | 
Ainputs HIGH gz east | |__| 200 


Notes: 1. For conditions shown as Min. or Max., use the appropriate Sallie specified under Operating Ranges for the applicable device type. 
2. Typical values are for Voc = + 25°C ambient and maximum loading. 
3. Not more than one output should be shorted at a time. Duration of the short-circuit test should not exceed one second. 







SWITCHING CHARACTERISTICS over operating range (Note 1) 















A. Combinational Propagation Delays 


No | From tO etay nit 
oe ee 
Pie ae eee ee er 
Re pe Oe 
DaGLK Z 

52 

allie 


—! 


i 


aN 


ns 
ns 
ns 
ns 


y a 







3 


3S j{5 


QCLK A-EMPTY or EMPTY a 
QCLK A-FULL or FULL . Wi. 


DQCLK 

DQCLK 
a 

QCLK 








3 





a 





a 





a] 





3], 5 
NIDINDI DI DINDINI NIN 


SB 
n 











PO | DM FM 
ine) [o) 
D 
* 
= 
oo 


NN ee ee ee ee ee ee ee ee ee ee 
- OProOrn | Di ayy; oOsprpl—| oO N o 
D 
m 
io) 
m 
4 


Parameter 


ine) 
w 


Bytes Dequeued Setup 
Bytes Dequeued Hold 





Nh 
BAN 


ne) 
nn 





Bytes Queued Setup 
Bytes Queued Hold 
Byte Swap Setup 

Byte Swap Hold 


NM] PO 
ae ie) 





Lye) 
© 





29 Data Setup 
30 Data Hold 
31 


G 
PO 







(ee) 
@ 


Dequeue Enable Setup 
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Dequeue Min. Pulse Width LOW 
Dequeue Min. Pulse Width HIGH ns 
Dequeue Min. Cycle Time / 80 
Queue Min. Pulse Width LOW 
QCLK Queue Min. Pulse Width HIGH ns 
Queue Min. Cycle Time | a 


Notes: 1. Case temperature (To) =0 to +85°C, supply voltage (Vcc) =5 V +5%. It is the responsibility of the user to maintain a case 
temperature of +85°C or less. AMD recommends an air velocity of at least 200 linear feet per minute over the heatsink. 
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SWITCHING TEST CIRCUITS 


Voc 


S2 


S; 


Vout o—ot 


CL 


TC001102 


5.0 - VBE - VoL 


lo. + VoL 
1K 


A. Three-State Outputs 


Vout o-oo 





TC001083 


5.0 - Vee —- VoL 


lot + VoL 
Ro 


B. Normal Outputs 


Notes: 1. Ci, = 50 pF includes scope probe, wiring and stray capacitances without device in test fixture. 
2. S1, Se, S3 are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpz} test. 
S; and Se are closed while Sg is open for tpz, test. 


4. C_ = 5.0 pF for output disable tests. 
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Test Philosophy and Methods 


The following points give the general philosophy that we apply 
to tests that must be properly engineered if they are to be 
implemented in an automatic environment. The specifics of 
what philosophies applied to which test are shown in the data 
sheet. | 


1. 


Ensure the part is adequately decoupled at the test head. 
Large changes in Vcc current as the device switches may 
cause erroneous function failures due to Vcc changes. 


. Do not leave inputs floating during any tests, as they may 


start to oscillate at high frequency. 


Do not attempt to perform threshold tests at high speed. 
Following an output transition, ground current may change 
by as much as 400 mA in 5-8 ns. Inductance in the ground 
cable may allow the ground pin at the device to rise by 
hundreds of millivolts momentarily. Current level may vary 
from product to product. 


. Use extreme care in defining input levels for AC tests. 


Many inputs may be changed at once, so there will be 
significant noise at the device pins and they may not 
actually reach Vi, or Viy until the noise has settled. AMD 
recommends using Vi), <0 V and Vjy = 3.0 V for AC tests. 


. To simplify failure analysis, programs should be designed 


to perform DC, Function, and AC tests as three distinct 
groups of tests. 


. Capacitive Loading for AC Testing — 


Automatic testers and their associated hardware have stray 
Capacitance that varies from one type of tester to another 
but is generally around 50 pF. This, of course, makes it 
impossible to make direct measurements of parameters 
that call for a smaller capacitive load than the associated 
stray capacitance. Typical examples of this are the so- 
called ''float delays,'' which measure the propagation 
delays into the high-impedance state and are usually 
specified at a load capacitance of 5.0 pF. In these cases, 
the test is performed at the higher load capacitance 
(typically 50 pF) and engineering correlations based on 
data taken with a bench set up are used to predict the 
result at the lower capacitance. 


Similarly, a product may be specified at more than one 
capacitive load. Since the typical automatic tester is not 
capable of switching loads in mid-test, it is impossible to 
make measurements at both capacitances even though 
they may both be greater than the stray capacitance. In 


these cases, a measurement is made at one of the two 


capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench setup and the knowledge that certain 
DC measurements (loH, loL, for example) have already 
been taken and are within spec. In some cases, special DC 
tests are performed in order to facilitate this correlation. 


7. Threshold Testing 


The noise associated with automatic testing (due to the 
long, inductive cables), and the high gain of the tested 
device when in the vicinity of the actual device threshold, 
frequently give rise to oscillations when testing high-speed 
circuits. These oscillations are not indicative of a reject 
device, but instead, of an overtaxed test system. To 
minimize this problem, thresholds are tested at least once 
for each input pin. Thereafter, ''hard'’ high and low levels 
are used for other tests. Generally this means that function 
and AC testing are performed at ''hard'' input levels rather 
than at Vi. Max. and Vjy Min. 


8. AC Testing 


Occasionally parameters are specified that cannot be 
measured directly on automatic testers because of tester 
limitations. Data input hold times often fall into this catego- 
ry. In these cases, the parameter in question is guaranteed 
by correlating these tests with other AC tests that have 
been performed. These correlations are arrived at by the 
cognizant engineer using data from precise bench meas- 
urements in conjunction with the knowledge that certain DC 
parameters have already been measured and are within 
spec. 


In some cases, certain AC tests are redundant since they 
can be shown to be predicted by other tests that have 
already been performed. In these cases, the redundant 
tests are not performed. 


9. Output Short-Circuit Current Testing 
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When performing los tests on devices containing RAM or 
registers, great care must be taken that undershoot caused 
by grounding the high-state output does not trigger parasit- 
ic elements which in turn cause the device to change state. 
In order to avoid this effect, it is common to make the 
measurement at a voltage (Voutput) that is slightly above 
ground. The Vcc is raised by the same amount so that the 
result (as confirmed by Ohm's law and precise bench 
testing) is identical to the Vout = 0, Vcc = Max. case. 
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SWITCHING WAVEFORMS (Cont'd.) 


E sence seal 
C3) 
64) 


| —— @4) 
8000-3 KXXXXXKXKXXXXXK | KKK XXXKXKKAXXKKX 


OE 
W) 
Yat" XXXXXKXXKXXKKKKXXXXKXXKXKKKKN 


PYERR 


Full/A-Full 


Empty/A-Empty 


Dequeue Cycle 


Empty/A-Empty 

_ Full/A-Full 
CNTo.¢ 
Yo-31/ 


0-3 
PYERR 


RESET Timing Diagram 


Notes: 1. Minimum time RESET must be asserted. 
2. This timing diagram is applicable to RXMIT. 
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SWITCHING WAVEFORMS (Cont'd.) 
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CHAPTER 4 





Arithmetic Processors 





Am29C323 CMOS 32-Bit Parallel Multiplier 4-1 
Am29325 32-Bit Floating-Point Processor 4-24 ] 
Am29C325 CMOS 32-Bit Floating-Point Processor 4-78 


Am29C327 CMOS Double-Precision Floating-Point Processor 4-133 











Am29C323 


CMOS 32-Bit Parallel Multiplier 


PRELIMINARY 





32-Bit Three-Bus Architecture 

- The device has two 32-bit input ports and one 32-bit 
Output port with clocked multiply time of 100 ns 

Speed Selects 

- 80- and 55-ns speed-select parts 

Single Clock with Register Enables 

~- The Am29C323 is controlled by one clock with 
individual register enables 

Supports Multiprecision Multiplication 

- The device has dual 32-bit registers on each data 
input port to perform multiprecision multiplication 


DISTINCTIVE CHARACTERISTICS 


@ Registers can be made transparent 

- Input and output registers can be made transparent 
independently to eliminate unwanted pipeline delay 

Supports Two's Complement, Unsigned or Mixed 

Numbers 

Data Integrity Through Master-Slave Mode and Pari- 

ty Check/Generate 

- Parity check/generate catches inter-device 
connection errors and master/slave mode provides 
complete function check 


GENERAL DESCRIPTION 


The Am29C323 is a high-speed 32 x 32-Bit CMOS Parallel 
Multiplier with 67-Bit Accumulator. The part is designed to 
maximize system level performance by providing a 32-bit 
three bus architecture and a single clock with register 
enables. 


The Am29C323 further enhances system throughput by 
providing individual register feedthrough controls, byte 
parity checking on both input ports and generation on the 
output port, and dual input registers on each data input bus 
to support multiprecision multiplication. The Am29C323 can 
manage a wide variety of data types, including two's 


complement, unsigned, or mixed mode input formats. A 
64 x 64-bit multiplication can be performed in seven clock 
cycles, including input and output. Additional features 
provided are a format adjust control allowing for standard 
output or left shifted output suitable for fractional two's 
complement arithmetic, rounding, and master/slave opera- 
tion. 


The Am29C323 is designed in low-power, high-speed 
CMOS with TTL-compatible I/O. The device is housed in a 
169-lead pin-grid-array package. 
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RELATED AMD PRODUCTS 


| PartNo. | Description 


Am29325 32-Bit Floating Point Processor 
Am29C325 CMOS 32-Bit Floating Point Processor 


Am29331 16-Bit Microprogram Sequencer 
Am29C331 CMOS 16-Bit Microprogram Sequencer 


Am29C332 | CMOS 32-Bit Extended Function ALU 
Am290517 


Am29C334 CMOS 64 x 18 Four-Port Dual Access Register File 





DETAILED BLOCK DIAGRAM 
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CONNECTION DIAGRAM 
169-Lead PGA 
Bottom View 
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*Pinout observed from pin side of package. 
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PIN DESIGNATIONS 
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LOGIC SYMBOL 
4 De ie 4 
PYp “PY, Yo-¥31 X%0-X31 PXo - PX3 


CLK 


| ENXA, ENXB 


ENYA, ENYB 


ENI 

ENP, ENT 

FA 

TSEL 

PSELO, PSEL1 
OE 
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XSEL, YSEL 
TCX, TCY 
ACCO, ACC1 
RND 

FTX, FTY, FTI 


FTP 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 


formed by a combination of: a. Device Number 


b. Speed Option (if applicable) 


c. Package Type 
d. Temperature Range 
e. Optional Processing 


AM29C323 = G é 





a. DEVICE NUMBER/DESCRIPTION 
Am29C323 
CMOS 32-Bit Parallel Multiplier 






a6, 608 






. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 


. TEMPERATURE RANGE 
C =Commercial (0 to + 70°C) 


. PACKAGE TYPE 
G = 169-Lead Pin Grid Array without Heatsink 
(CGX169) 


. SPEED OPTION 
~1=80 ns 
~2=55 ns 


Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 
products. 
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ORDERING INFORMATION 
APL Products 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Device Class 

d. Package Type 

e. Lead Finish 


AM29C323 /B Z C 


_ LEAD FINISH 


C = Gold 


d. PACKAGE TYPE 
Z = 169-Lead Pin Grid Array without Heatsink 
(CGX169) 


c. DEVICE CLASS 
/B = Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C323 
CMOS 32-Bit Parallel Multiplier 


Valid Combinations Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. 


Group A Tests 


Group A tests consist of Subgroups 
1, 2, 3, 7, 8, 9, 10, 11. 


PIN DESCRIPTION 


ACCO, ACC1 Accumulator Control (input) 
Accumulator control lines used to determine accumulator 
function; PASS, ACCUMULATE, and SHIFT and 
ACCUMULATE. 

CLK  Ciock (Input) 

Clock input for all registers. 

ENI Instruction Register Enable (Input; Active LOW) 
Register enable for instruction register I. 

ENP Accumulator Register Enable (input; Active 

LOW) 
Register enable for product register P. 

ENT Temporary Register Enable (Input; Active LOW) 
Register enable for temporary register T. 

ENXA, ENXB  Multiplicand Register Enable (input; 

Active LOW) 
Register enables for multiplicand data input registers XA 
and XB. 

ENYA, ENYB Multiplier Register Enable (Input; 

Active LOW) | 
Register enables for multiplier data input registers YA and 
YB. 

FA Format Adjust (Input) 

Format adjust selects either a full 64-bit product (HIGH) or a 
left shifted 63-bit product suitable for fractional two's 

_ complement arithmetic (LOW). 

FTP Feedthrough Control (input; Active HIGH) 
Feedthrough control for product register. 

FTX, FTY, FT| Feedthrough Control (Input; Active HIGH) 
Feedthrough control lines for X, Y, and | registers. 

HDERR~ Hard Error Flag (Output) 

Used when two Am29C323s are configured as master and 
slave to indicate hardware errors. 

OE Output Enable Control (Input; Active LOW) 

Used to enable (LOW) or disable (HIGH) the P output port. 

Po-P3; Product Output (Input/Output; Three State) 
Product output for P port. 


FUNCTIONAL DESCRIPTION 
Architecture 


The Am29C323 comprises a high speed 32 by 32-bit multiplier 
array, a 67-bit accumulator, and a 32-bit data path. 


Multiplier Array 


The multiplier is a 32 by 32-bit array that produces a 64-bit 
product. This product is then fed to the accumulator section. 


Accumulator 


The accumulator is 67 bits wide. It performs accumulation for 
sum of product operations and multiprecision multiplication 
operations. The accumulator can perform three operations: 
store product without accumulation, accumulate product, and 
shift accumulator value and accumulate with product. 


The shift and accumulate shifts the value in the product 
register 32 bits to the right (effectively moving the most 
significant 32 bits to the least significant 32 bits) and sign 
extends to a full 64 bits. This shifted value is then accumulated 
with the output of the multiplier array. 


Parity Error Flag (Input/Output; Three 
State) 
Indicates a parity error on the input buses. 
PP9-PP3 + Byte Parity (Input/Output; Three State) 
Byte parity generated on P output port (even parity). 
PSELO, PSEL1 Product Control (input) 
Used to select desired output including disabling P and PP 
output ports. 
PX9-PX3 + Byte Parity (Input) 
Byte parity inputs on X input port (even parity). 
PY 9-PY3 Byte Parity (Input) 
Byte parity inputs on Y input port (even parity). 
RND Round Control (Input; Active HIGH) 
Round control for rounding the most significant product. 


SLAVE Master/Slave Control (input) 
Used to determine mode of operation. 


TCX, TCY Mode Control (Input) 


PRRER 


Mode control inputs for each input data word; LOW for 


unsigned data and HIGH for two's complement format. 


TSEL Select Control (Input) 

_ Used to route the most significant product register (HIGH) or 
the least significant product register (LOW) into the 
temporary register. 


Xo- X31 Multiplicand Data (input) 
Multiplicand data input for X port. 


XSEL X Register Select (Input) 
Control line used to route the contents of either the XA 
register (HIGH) or XB register (LOW) into the multiplier 
array. 


Yo-Y31 Multiplier Data (input) 
Multiplier data input for Y port. 


YSEL Y/Y Register Select (Input) 
Control line used to route the contents of either the YA 
register (HIGH) or YB register (LOW) into the multiplier 
array. 


The 67-bit width is necessary to contain overflows in internal 
accumulations. These overflows are maintained and used 
when the product register is right shifted in the multiprecision 
multiplies. The lower 64 bits contain the 64-bit output while the 
upper 3 bits contain the overflow. 


Data Path 


The 32-bit data path consists of X and Y input buses; the P 
output bus; data registers XA, XB, YA, YB, and the product 
accumulator; two multiplier input multiplexers; byte parity input 
checkers; byte parity output generators; and master/slave 
comparators. Input operands enter the device through the two 
32-bit input buses, Xq- X31 and Yo- Y31. These operands 
may then be stored in one of the two registers for each bus 
(XA or XB for X, YA or YB for Y) or they may be fed directly 
through to the multiplier array. Input parity checking is per- 
formed as soon as the operands are put on the input buses. 
The signals used for output parity generation are taken from 
the input side of the output translator. In case of parity error, 
PRERR is enabled HIGH. 




















| Operational Modes 


The Am29C323 can perform signed, unsigned, or mixed mode 
multiplication. These different numerical representations are 
controlled by TCX and TCY. A HIGH input on one of these 
lines indicates to the device that the respective input should 
be treated as a two's complement number; a LOW, an 
unsigned number. The output format is unsigned when both 
inputs are unsigned: The output format is two's complement 
when either or both inputs are two's complement. 


Slave Mode 


Each output has an associated comparator which compares 
the signal on the output pin with the signal provided to the 
output driver. If any of these outputs do not agree, the HDERR 
is asserted. When not in slave mode, this enables the 
multiplier to check for contention and bus shorts. However, 
when in slave mode, one multiplier can be used to detect 
faults in both internal functions and interconnections of the 
other multiplier. This is accomplished through the master/ 
slave configuration, where the two multipliers operate in 
parallel. One multiplier is the master and operates normally; 
the other operates in slave mode. 


In slave mode all outputs are turned into inputs from the 
master, except for the HDERR signal. Since the slave is 
operated in parallel with the master, it can compare the results 
it generates to those of the master and signal an error if they 
differ. 


Command Description and Formats 


The accumulator is controlled by ACCO and ACC1. These 
lines are used to select any of the three operations that the 
accumulator can perform. This instruction set is described in 
Table 1. 


The temporary output register is controlled by TSEL and FA. 
These lines are used to select any of the four different sets of 
data that can be stored in the temporary register. This 
instruction set is described in Table 2. 


The output multiplexer is controlled by PSELO, PSEL1, and 
FA. These lines are used to select any of the five different sets 
of data that can be output through the P port. PSELO and 
PSEL1 can also be used to disable the outputs. (This 
instruction is independent of OE.) This instruction set is 
described in Table 3. 


Format Adjust (FA) is used to select either a full 64-bit product 
or a left-shifted 63-bit product suitable for fractional two's 
complement arithmetic. This shifting increases the precision of 
the upper half of the product word by eliminating the redun- 
dant sign bit. Output Data Formats show the effect of FA. 


Round (RND) is used to round the upper 32 bits of the 64-bit 
product. If only the upper 32 bits of the product are being 
used, then the lower 32 bits are truncated when rounding is 
not used (RND = 0). If rounding is used (RND = 1), thena''1” 
is added to the most significant of the lower 32 bits. This 


results in a smaller possible error. This should only be used. 
when the lower 32 bits are to be truncated. 


User Visible Register Descriptions 


The Am29C323 contains seven different register sets, each 
with its own clock enable. Two 32-bit registers are attached to 
each of the input data buses. These registers are differentiat- 
ed by the suffix A or B. For example, the X bus has registers 
XA and XB. The 67-bit accumulator register can be used as a 
regular product register when the part is used as a multiplier 
only or as the register part of the accumulator section. The 32- 
bit temporary output register is included to aid in the pipelining 
of multiprecision operations. An instruction register is also 
provided. 


All of these registers can be made transparent with the 
exception of the accumulator register and the temporary 
register. The product from the multiplier can be fed directly to 
the output by using the FTP control line. 


‘TABLE 1. ACCUMULATOR OPERATION | 
INSTRUCTIONS 


[AcGt [AGC0 | Accumulator Operation 
Ce 
Ce 
a 













TABLE 2. INPUT SELECT INSTRUCTIONS FOR 


TEMPORARY (T) REGISTER 


[rset] FA [ Temp Reg input 
Pofolas 


Pott, 
Pitot 





TABLE 3. OUTPUT SELECT INSTRUCTIONS FOR 


PRODUCT (P) PORT 


Trseis [Poel | FA | P Pon Output 
Poo [0 [x [reve resister 





oe a 
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31 
20 


30 
o-1 


29 
9-2 


Am29C323 X AND Y INPUT DATA FORMATS 
Fractional Two's Complement 


TCX, TCY=1 
28 27 26 - - - - - 3 


9-3 9-4 9-5 o- 28 
Integer Two's Complement 


TCX, TCY =1 
28 27 26 - - - - - 3 


Unsigned Fractional 


TCX, TCY=0 


28 27 26 = z = a es 3 
ee a a 


Unsigned Integer 


TCX, TCY=0 
28 27 26 - ~ - - ~ 3 
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2 1 
9-29 9-30 
2 1 
92 21 
2 1 
9-30 9-31 
2 1 
22 91 


0 
9-31 


0 
9-32 

















AS 
os 


_Am29C323 P-PORT OUTPUT DATA FORMATS 
Fractional Two's Complement (Shifted)* 


FA =0, PSEL1= 1, PSELO=0 
30 29 28 27 26 ie = "eg = 





= 90 o- ne 9-2 Q- 3 9-4 9-5 o- 28 o- 29 9-30 9-31 


FA =0, PSEL1=0, PSELO= 1 


31 30 29 2 27 2 - - = ~ - 3 2 1 0 
9-32 9-33. 9-34. 9 85 86 37 p60 9-61 9-62 5-63" 


Fractional Two's Complement 


FA=1, PSEL1= 1, PSELO=0 


31 30 29 2827 26 - - - - - 3 2 1 0 
91 29 9-1 Q-2 9-3 oa 9-27 =9-28 9-29 9-30 


FA=1, PSEL1=0, PSELO= 1 
31 30 29 28 27 x2 - - - - - 3 2 1 0 
p31 9-32 9-33. 9-34 9-35 9-36 p59 9-60 5-61 9-62 


Integer Two's Complement 


FA=1, PSEL1=1, PSELO=0 


31 30 29 28 27 26 = = me a = 3 2 1 0 
2683 962 961 260 959 258 935 934 933 932 


FA = 1, PSEL1=0, PSELO=1 
31 30 29 28 27 26 - - - - - 3 2 1 0 


Unsigned Fractional 


FA=1, PSEL1=1, PSELO=0 


31 30 29 #28 27 ~~ 26 Z 2 Z e = 3 a. 34 0 
o~ 1 oF 2 2- 3 o- 4 o- 5 2- 6 Q- 29 - 30 o- 31 o- 32 


FA=1, PSEL1=0, PSELO= 1 





9-33 9-34 9-35 9 369-37 9-38 9-61 5-62 95-63 5-64 


Unsigned Integer 


FA=1, PSEL1= 1, PSELO=0 


31 30 29 28 27 26 - - = - = 3 2 1 0 
p63. 62-1680. 935. 984.3. 2 


FA=1, PSEL1=0, PSELO= 1 
31 30 29 28 27 26 - = - - - 3 2 1 0 
931 930 929 728 927 926 93 02 ot 90 


*In this format, an overflow occurs in the attempted multiplication of the two's complement number — 1.000 with itself, yielding a 
product of +1.000 which cannot be represented in this format. **This bit position (27 63) equals zero in this format. 
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64 x 64 Multiplication 





different form, is shown with the necessary instructions below: 


To perform a 64 x 64-bit multiplication using the Am29C323, X~> XW1 XWwO 

each 64-bit input must be split into two 32-bit inputs; a most ‘> * YW14 YWO 

significant half and a least significant half (XW1 and XWO or a 

YWt and YWO, respectively). These 32-bit inputs are then XWO * YWO <& Multiply only 

used to perform the four multiplications needed to obtain the _ XW1 + YWO = Mult & Shift/Acc 
128-bit product. This product is represented in four 32-bit XWO + YW1 Mult & Accumulate 
words, PW3 - PWo, the least significant word being PWo. The XWi + YW = Mult & Shift/Acc 


product is output 32 bits at a time through the product (P) port. Pe PW3 PW2 PW1 PWO 
The following equation shows the required multiplications: 


X * Y= ((XW1 * YW1) * 264) + ((XWO * YW1) * 292) Table 4 details the movement of the input operands through 
+ ((XW1 * YWO) * 292) + ((XWO * YWO) * 20)) the Am29C323. Table 5 defines the microcode required to 

96 64 30 perform a signed 64 x 64-bit multiplication. For an unsigned 

= (PW3 * 2°") a (PW2 * 2°") + (PW1 * 2°*) multiplication, TCX and TCY are LOW for all cycles. The 


#(EWO'* Z ) operations and data movement are scheduled to produce a 
The Am29C323 uses an internal accumulator to sum these single product in seven clock cycles or a new pipelined 
intermediate products. The previous equation, in a slightly product every four clock cycles. 










TABLE 4. BUS AND REGISTER CONTENTS FOR A 64x 64-BIT SIGNED MULTIPLICATION WITH ONE 
COMPLETE EXTENDED MULTIPLICATION SHOWN IN THE UNSHADED CYCLES 


| cyte Cf | a cl 
SE A ST 

[Wo 
[yeus | wo 
eres | 


Note: MPY OP = Operation of multiplier array (X*Y) 
ACC OP = Operation of internal accumulator 
PASS = Pass through multiplier product 
ACC = Add previous result to current product 
S/A = Shift previous result then add to current product 


TABLE 5. INSTRUCTION MICROCODE FOR 64x 64-BIT SIGNED MULTIPLICATION WITH ONE 
COMPLETE EXTENDED MULTIPLICATION SHOWN IN THE UNSHADED CYCLES 


a ee eee 
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ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 


Storage Temperature -65 to +150°C Commercial (C). Devices : 
Ambient Temperature Under Bias -55 to +125°C Temperature (Ta) 0 to +70°C 
Supply Voltage to Ground Potential Supply Voltage (Vcc) +4.75 to +5.25 V 
Continuous -0.3 to +7.0 V | 
DC Voltage Applied to Outputs For 
High Output State -0.3 to +Vcoco + 0.3 V 
DC Input Voltage -0.3 to +Voco + 0.3 V 
DC Output Current, Into LOW Outputs 
DC Input Current 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality *Military Product 100% tested at Ta = + 25°C, + 125°C, and 
at or above these limits is not implied. Exposure to absolute ~55°C. 

maximum ratings for extended periods may affect device 

reliability. 


Military* (M) Devices 
Temperature (Ta) -55 to +125°C 
Supply Voltage (Vcc) +45 to +5.5 V 


Operating ranges define those limits between which the 
functionality of the device is guaranteed. 


DC CHARACTERISTICS over operating range unless otherwise specified (for APL Products, Group A, 
ees 1, 2, 3 are tested unless otherwise noted) 


















Voc = Min. 
Vin = Vin or Vit 
lon =-0.4 mA 










Voc = Min., 
VIN = = Vin or VIL 











Input HIGH Level 
_ Input LOW Level 


Vin =Voc or GND, 

















loc 
Power Dissipation Voc = 5.0 V, 

Cpp Capacitance Ta = 25°C, 3000 pF Typical 
(Note 3) No Load 


Notes: 1. Vcc conditions shown as Min. or Max., refer to the military or commercial Vcc limits. _ 
2. These input levels provide zero noise immunity and should only be statically tested in a noise-free environment (not 
functionally tested). 
3. Cpp determines the no-load dynamic current consumption: 
Icc (Total) = Icc (Static) + Cpp Vcc f, where f is the switching frequency of the majority of the internal nodes, 
normally one-half of the clock frequency. This specification is not tested. 


SWITCHING CHARACTERISTICS over COMMERCIAL operating range 








Parameter Parameter Test 
Symbol Description Conditions | Min. | Max. | Min. | Max. | Min. | Max. | 
UNCLOCKED MODE 
Unclocked Multiply Time a 
Unclocked Multiply Time ra 
Xo - X31, Yo-Y31 to PP9-PPs3 ee fe] fas f | rs | os | 
[2 [| mratnoronro amie | Rsereneman” | [em || 0 
Ts [ir [os wricens— [eeienongs [Pa [De 


CLOCKED MODE 


Ts] wc | Clocked matey Time‘ Fiwvip=tow | | io] | 00 


30 
5 







a 
Te [oe [owner | SS Ea 
ee 











Data to Product Register Setup = 
t Data to Product Register Hold ETX/Y = HIGH 
HP Time 
Instruction to Product Register = 
Instruction to Product Register 7 


Clock Pulse Width HIGH 
Clock Pulse Width LOW 


10 
11 
12 
13 
14 
15 







SETUP AND HOLD TIMES 
Register XA, XB, YA, YB Setup 


'Sxy Time 


t Register XA, XB, YA, YB Hold 
HXY Time 





ty | Instruction Register Setup Time 
Instruction Register Hold Time 
tSEN Register Enable Setup Time Po 


— 
oO 


tHEN Register Enable Hold Time 
TSEL Setup Time 
TSEL Hold Time 


COMMON PARAMETERS 


PSELO-PSEL1 to Po - P31 To Active State Only 
| 25 | tppp —_—s|: PSELO-PSEL1 to PPo -PP3 To Active State Only 


OE to Po-P31, PPo-PP3 
OE or PSELO-PSEL1 to 

27 top Po - P31, PPg-PP3 Output 
Disable 


Notes: 1. Instruction signals are XSEL, YSEL, TCX, TCY, ACCO, ACC1, and RND. 


16 
17 
18 
20 
21 
22 
23 
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SWITCHING CHARACTERISTICS over MILITARY aperatins range (for APL Products, Group A, Subgroups 
9, 10, 11 are tested unless otherwise noted) © 














Test | 
eonaluons 


Parameter 
Description 






Parameter 
Symbol 


UNCLOCKED MODE | 


Unclocked multiply ne 
vias ai multiply Time 


CLOCKED MODE 
Clocked Multiply Time FTX/Y/P = LOW 22 
0 tout Taken 7 tem O 
at — ? ; 
Clock to PPy - PP 5 orn meses 
Fi, 


Hon 
_tHIPT 


Clock Pulse Width HIGH 2 















ak 
od 
eS 


eR aie NN CI HE 
Oe a 
OS 


COMMON PARAMETERS 





[et [ie ——=«d;sCPSELO-POELTOPo-Psn—=S*~=‘“z ‘TO ActNo Stato ony —SCsYSSC«dSC 

(25 | teep | PSELO-PSELI wo PPO-PPs + ‘To Active State Ony «| «| a0—| ns 

[28 Togo [Eo PoP Po Pha Out Bowe [OO ae 
Sahat ts Output Disable 

ARRANGE! ie I We 

(20 | tone | Data toHDERRSSSCSC~*~‘“~tC Swe CYSSC*dSC SY 


Notes: 1. Instruction signals are XSEL, YSEL, TCX, TCY, ACCO, ACC1, and RND. 


| 
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SWITCHING TEST CIRCUITS 


Vec 


$3 TC001082 


TC001101 


A. Three-State Outputs B. Normal Outputs 


Notes: 1. C, = 50 PF includes scope probe, wiring and stray capacitances without device in test fixture. 
2. S1, Se, Sg are closed during function tests and all AC tests except output enable tests. 
3. Sy and Sg are closed while So is open for tpz} test. 
S; and So are closed while S3 is open for tpz, test. 
4. C,_ = TBD for output disable tests. 





SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM INPUTS OUTPUTS 


MUST BE WILL BE 
STEADY STEADY 


WILL BE 
CHANGING 
FROMH TOL 


MAY CHANGE 
FROMH TOL 


WILL BE 
CHANGING 
FROM L TOH 


MAY CHANGE 
FROML TOH 


DON’T CARE; CHANGING; 
ANY CHANGE STATE 
PERMITTED UNKNOWN 


CENTER 
DOES NOT LINE I$ HIGH 
APPLY IMPEDANCE 

“OFF STATE 





KS000010 
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Se. negege, 
ccs Sos 
SI [RY 


ry af 


WF022971 
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SWITCHING WAVEFORMS (Cont'd.) 


CLK 


, G9 


ENYA, ENYB, 


ENI 


ENXA, ENXB 


“> Lor 
ets See. 
ecece, necece 
SOS 
arecen 
os 
RR 
eee 
secenes 
Boog & 
Zz : E | 
& So 
lz < 


PPp9 — PP3 


Clocked Operation: FTX, Y, P, |1= LOW 








3 
cS 
\2) 
o 
wo 
= 
or 
Oo 
Li. 
7 
> 
<< 
> 
Le) 
= 
r 
O 
F 
= 
7) 


CLK 


oY 


$05 


\ 


ra 


© 


7 
> 
ge 


+, 


. 


Xo — Xg4 
Yo — Y31 


On 
BOY 
029 


v, 
5 


R0) 
ENYA, ENYB 


ENXA, ENXB 
ENI 


xB 


re 
a“ 


Ce 
+ 





INST 


*%s 





oO 
a 

i 
& 





PP — PP3 


WF022960 


PSEL1 + PSELO) 


Output Taken from Adder 
= HIGH; 


FTP 


Clocked Operation 


(FTX, Y, |= LOW 
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SWITCHING WAVEFORMS (Cont'd.) 


CLK 


INST 


oe 
oo, 

o,¢ 
RS 


ENP, ENT 


TSEL 


a8 


Po — P31 


i 


PPo — PPg 


WF022983 


= LOW) 


Clocked Operation: Input Registers Bypassed 
(FTX, Y, |= HIGH; FTP 
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SWITCHING WAVEFORMS (Cont'd.) 


Xo — Xg1. 


Yo — Ys 


INST 





SBABDAAASAAA 
ctetetetetetatstatehe®s 
ececteteectatctctctee, 
ROO OOOO? 


rere 
anee 
OLY, 


— 

£555 
S55 
XXX 


S58 
octet 
OO 


Po — Pat 





PPg — PP, 


WF022990 


HIGH 


Unclocked Mode: FTX, Y, I, P 











PSEL1 =H 
| PSELO =H 


AN’ 
111 
WN’ 


Ll 

[If | ses 
OB 

qe 

Wi | NNN 


WY 
G7) 
IN’ 


) 
aa 


Output Select Timing 


G7) 
wy, 
(ff; | SON 


_ 
24) 
5) 


PSELO — PSEL1 


OE 


Po — Pa 
PPg 


PPy 


WF023001 
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SWITCHING WAVEFORMS (Cont'd.) 





Xo - Xg1, PXq- PXg 


Yo - Yg1 + PY9- PYg 


WF023013 


PRERR Timing 


WF023024 


Slave Mode Timing 


3.0 V VaValatavatavala sv etata?, VY AAAAAAA/ 
INPUTS KRY IN Nsecacnecacner 
YAY RY REXEL 1.5. V 1.5 VBRXRRRKK 

ZERISYS YX LK LON PON 


OV 


3.0 V 
CLOCK 
Ov 
1Y6;0°0°0.6°0,0,0 0000000000000 1 
OUTPUTS RA RX RXR XAOS 





TIRES L OLR \ 


WFRO2990 





4-22 


INPUT/OUTPUT CURRENT INTERFACE DIAGRAMS 


OUTPUT 





DRIVEN INPUT 


Vec 





IC000861 
\C000870 


C; ~ 5.0 pF, all inputs Co © 5.0 pF, all outputs 
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Am29325 


32-Bit Floating-Point Processor 


DISTINCTIVE CHARACTERISTICS 


Single VLSI device performs high-speed floating-point 
~ arithmetic 

- Floating-point addition, subtraction, and multiplication 
in a single clock cycle 

- Internal architecture supports sum-of-products, 
Newton-Raphson division 

@ 32-bit, three-bus flow-through architecture 

— Programmable I/O allows interface to 32- and 16-bit 

systems 


@ |EEE and DEC formats 
~ Performs conversions between formats 
- Performs integer < > floating-point conversions | 
Six flags indicate operation status 
Register enables eliminate clock skew 
Input and output registers can be made transparent 
independently 


GENERAL DESCRIPTION 


The Am29325 is a high-speed floating-point processor unit. 
It performs 32-bit single-precisior floating-point addition, 
subtraction, and multiplication operations in a single VLSI 
circuit, using the format specified by the proposed IEEE 
floating-point standard, P754. The DEC single-precision 

floating-point format is also supported. Operations. for 
conversion between 32-bit integer format and floating-point 
format are available, as are operations for converting 
between the IEEE and DEC floating-point formats. Any 
operation can be performed in a single clock cycle. Six 
flags — invalid operation, inexact result, zero, not-a-num- 
ber, overflow, and underflow — monitor the status of opera- 
tions. 


The Am29325 has a three-bus, 32-bit architecture, with two 
input buses and one output bus. This configuration provides 


high 1/O bandwidth, allows access to all buses and affords 
a high degree of flexibility when connecting this device in a 
system. All buses are registered with each register having a 
clock enable. Input and output registers may be made 
transparent independently. Two other I/O configurations, a 
32-bit, two-bus architecture and a 16-bit, three-bus archi- 
tecture, are user-selectable, easing interface with a wide 
variety of systems. Thirty-two-bit internal feedforward data- 
paths support accumulation operations, including sum-of- 
products and Newton-Raphson division. 


Fabricated with the high-speed IMOX™ bipolar process, 
the Am29325 is powered by a single 5-volt supply. The 
device is housed in a 145-terminal pin-grid-array package. 


Am29300 FAMILY HIGH-PERFORMANCE SYSTEM BLOCK DIAGRAM 


MICROPROGRAM 
MEMORY 


PIPELINE 
REGISTER 


CONTROL 
SIGNALS 


IMOX is a trademark of Advanced Micro Devices, Inc. 
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Am20334 
REGISTER 
FILE 
64 x 18 


| ss 
aa oat i 32 x 32 


) 
: OLTPLIER 


hi = i iis hk 
ae : st 
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CONNECTION DIAGRAM 
| Top View 


PGA 


14 OBUS OE VCCE CLK 
FTO FT1 VCCE VCCE  RNDO 


ENS 16/32 VCCE VCCE  VCCE 


GNDT GNDT GNDT GNDT GNDE GNDE GNDE_ S8 


F2 GNDT 


Ft GNDT 


16/32 = S16/32 
GNDE = Ground, ECL 
GNDT = Ground, TTL 


1/D = IEEE/DEC 
INEX = INEXACT 
INVA = INVALID 


OBUS = ONEBUS 
OVFL = OVERFLOW 
P/AFF = PROJ/AFF 
UNFL = UNDERFLOW 
VCCE = Voc, ECL 
VCCT = Vcc, TTL 


FO $1 $2 GNDE $4 S9 


P/AFF SO $3 S5 $7 S6 





*D4 is an alignment pin (not connected internally). 
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PIN DESIGNATIONS 
(Sorted by Pin No.) 
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PIN DESIGNATIONS (Cont'd.) 






(Sorted by Pin Name) 


| piInno. | PINNAME | PINNO.| PINNAME. | PINNO.| PIN NAME | PIN NO.| PIN NAME. 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 
formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29325 icy Cc B 


See OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


d. TEMPERATURE RANGE 

C = Commercial (0 to + 85°C) Case 
c. PACKAGE TYPE 

G = 145-Terminal Pin Grid Array (CG 145) 
b. SPEED OPTION 

Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29325 
32-Bit Floating-Point Processor 


Valid Combinations Valid Combinations 


Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 


products. 
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PIN DESCRIPTION 


Ro-R31 FR Operand Bus (Input) 
Ro is the least-significant bit. 


So-S31 S Operand Bus (Input) 
So is the least-significant bit. 


Fo-F3; F Operand Bus (Output) 
Fo is the least-significant bit. 


CLK Clock (input) 
For the internal registers. 


ENR Register R Clock Enable (Input; Active LOW) 
When ENR is LOW, register R is clocked on the LOW-to- 
HIGH transition of CLK. When ENR is HIGH, register R 
retains the previous contents. 


ENS Register S Clock Enable (Input; Active LOW) 
When ENS is LOW, register S is clocked on the LOW-to- 
HIGH transition of CLK. When ENS is HIGH, register S 
retains the previous contents. 


ENF Register F Clock Enable (Input; Active LOW) 
When ENF is LOW, register F is clocked on the LOW-to- 
HIGH transition of CLK. When ENF is HIGH, register F 
retains the previous contents. 


OE Output Enable (Input; Active LOW) 
When OE is LOW, the contents of register F are placed on 
Fo-F31. When OE is HIGH, Fo—F3; assume a high- 
impedance state. 


ONEBUS Input Bus Configuration Control (Input) 
A LOW on ONEBUS configures the input bus circuitry for 
two-input bus operation. A HIGH on ONEBUS configures 
the input bus circuitry for single-input bus operation. 


FTg Input Register Feedthrough Control (Input; 
Active HIGH) 
When FTo is HIGH, registers R and S are transparent. 


FT; Output Register Feedthrough Control (Input; 
Active HIGH) 
When FT is HIGH, register F and the status flag register 
are transparent. 


lo-lg Operation Select Lines (Input) 
Used to select the operation to be performed by the ALU. 
See Table 1 for a list of operations and the corresponding 
codes. 


Ig ALU S Port Input Select (Input) 
A LOW on lg selects register S as the input to the ALU S 
port. A HIGH on lg selects register F as the input to the ALU 
S port. 


Definition of Terms 
Affine Mode 


One of two modes affecting the handling of operations on 
infinities — see the Operations with Infinities section under 
Operations in IEEE Mode. 


Biased Exponent 


The true exponent of a floating-point number, plus a constant. 
For IEEE floating-point numbers, the constant is 127; for DEC 
floating-point numbers, the constant is 128. See also True 
Exponent. . 


Data input or output channel for the floating-point processor. 


lg Register R Input Select (input) 
A LOW on lq selects Ro — R31 as the input to register R. A 
HIGH selects the ALU F port as the input to register R. 


IEEE/DEC IEEE/DEC Mode Select (Input) 
When !EEE/DEC is HIGH, IEEE mode is selected. When 
IEEE/DEC is LOW, DEC mode is selected. 


S16/32 16- or 32-Bit 1/O Mode Select (Input) 

A LOW on S16/32 selects the 32-bit |/O mode; a HIGH 
selects the 16-bit [/O mode. In 32-bit mode, input and 
output buses are 32 bits wide. In 16-bit mode, input and 
output buses are 16 bits wide, with the least- and most- 
significant portions of the 32-bit input and output words 
being placed on the buses during the HIGH and LOW 
portions of CLK, respectively. 


RNDo, RND; Rounding Mode Selects (input) 
RNDo and RND, select one of four rounding modes. See 
Table 5 for a list of rounding modes and the corresponding 
control codes. 


PROJ/AFF Projective/Affine Mode Select (Input) 
Choice of projective or affine mode determines the way in 
which infinities are handled in IEEE mode. A LOW on 
PROJ/AFF selects affine mode; a HIGH selects projective 
mode. 


OVERFLOW Overflow Fiag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a final 
result that overflowed the floating-point format. 
UNDERFLOW  Underflow Fiag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a 
rounded result that underflowed the floating-point format. 


ZERO Zero Flag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a final 
result of zero. 


NAN Not-a-Number Flag (Output; Active HIGH) 
A HIGH indicates that the final result produced by the last 
operation is not to be interpreted as a number. The output in 
such cases is either an IEEE Not-a-Number (NAN) or a 
DEC-reserved operand. 


INVALID Invalid Operation Flag (Output; Active 


HIGH) 
A HIGH indicates that the last operation performed was 
invalid; e.g., °° times 0. 


INEXACT  Inexact Result Flag (Output; Active HIGH) 
A HIGH indicates that the final result of the last operation 
was not infinitely precise, due to rounding. 


DEC-Reserved Operand 


A DEC floating-point number that is interpreted as a symbol 
and has no numeric value. A DEC-reserved operand has a 
sign of 1 and a biased exponent of 0. 


Destination Format 


The format of the final result produced by the floating-point 
ALU. The destination format can be IEEE floating point, DEC 
floating point, or integer. 


Final Result 
The result produced by the floating-point ALU. 
Fraction 


The 23 least-significant bits of the mantissa. 
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Infinitely Precise Result - 


_The result that would be obtained from an operation if both 
' exponent range and precision were unbounded. 


input Operands 


The value or values on which an operation is performed. For 
example, the addition 2 + 3 = 5 has input operands 2 and 3. 


Mantissa 


The portion of a floating-point number containing the number's 


significant bits. For the floating-point number 1.101 x 2-3 the 
mantissa is 1.101. 


NAN (Not-a-Number) 


An |IEEE floating-point number that is interpreted as a symbol, 
and has no numeric value. A NAN has a biased exponent of 
25510 and a non-zero fraction. 


Port 
Data input or output channel for the floating-point ALU. 
Projective Mode 


One of two modes affecting the handling of operations on 
infinities — see the Operations with Infinities section under 
Operation in IEEE Mode. 


Rounded Result . 


The result produced by rounding the infinitely precise result to 
fit the destination format. : 


True Exponent (or Exponent) 


Number representing the power of two by which a floating- 
point number's mantissa is to be multiplied. For the floating- 
point number 1.101x 27%, the true exponent is -3. 


FUNCTIONAL DESCRIPTION 
Architecture | 


The Am29325 comprises a high-speed, floating-point ALU, a 
status flag generator, and a 32-bit data path. 


Fioating-Point ALU 


The floating-point ALU performs 32-bit floating-point opera- 
tions. It also performs floating-point-to-integer conversions, 
integer-to-floating-point floating-point conversions, and con- 
versions between the IEEE and DEC formats. The ALU has 
two 32-bit input ports, R and S, and a 32-bit output port, F. 


- Conceptually, the process performed by the ALU can be 
divided into three stages (see Figure 1). The operation stage 
performs the arithmetic operation selected by the user; the 
output of this section is referred to as the infinitely precise 
result of the operation. The rounding stage rounds the 
infinitely precise result to fit in the destination format; the 
output of this stage is called the rounded result. The last stage 
checks for exceptional conditions. If no exceptional condition 
is found, the rounded result is passed through this stage. If 
some exceptional condition is found (e.g., overflow, underflow, 
or an invalid operation), this section may replace the rounded 
result with another output, such as + 9, -9°. a NAN or a DEC- 


reserved operand. The output of this last stage appears on 
port F, and is called the final result. 


OPERAND R 


OPERAND S 


OPERATION STAGE 
(PERFORMS SELECTED OPERATION) 







<ememene INFINITELY PRECISE RESULT 


ROUNDING STAGE 
(ROUNDS INFINITELY PRECISE 
RESULT) 


wane ROUNDED RESULT 


EXCEPTION STAGE 
(CHECKS FOR UNUSUAL CONDITIONS) 


F 


FINAL RESULT 
AF004540 


Figure 1. Conceptual Model of the Process 
Performed by the Floating-Point ALU 


The ALU performs one of eight operations; the operation to be 
performed is selected by placing the appropriate control code 
on lines Ig — lo. Table 1 gives the control codes corresponding 
to each of the eight operations. 


The floating-point addition operation (R PLUS S) adds the 
floating-point numbers on ports R and S, and places the 
floating-point result on port F. In EEE mode (IEEE/ 
DEC = HIGH) the addition is performed in IEEE floating-point 
format; in DEC mode (IEEE/DEC = LOW) the addition is 
performed in DEC format. 


The floating-point subtraction operation (R MINUS S) sub- 
tracts the floating-point number on port S from the floating- 
point number on port R and places the floating-point result on 
port F. In IEEE mode (IEEE/DEC = HIGH) the subtraction is 
performed in IEEE floating-point point format; in DEC mode 
(IEEE/DEC = LOW) the subtraction is performed in DEC 
format. , 


The floating-point multiplication operation (R TIMES S) multi- 
plies the floating-point numbers on ports R and S, and places 
the floating-point result on port F. In IEEE mode (IEEE/ 
DEC = HIGH) the multiplication is performed in IEEE floating- 
point format; in DEC mode (IEEE/DEC = LOW) the multiplica- 
tion is performed in DEC format. 


The floating-point constant subtraction (2 MINUS S) operation 
subtracts the floating-point value on port S from 2, and places 
the result on port F. The operand on port R is not used in this 
operation; its value will not affect the operation in any way. In 
IEEE mode (IEEE/DEC = HIGH) the operation is performed in 
IEEE floating-point format; in DEC mode (IEEE/DEC = LOW) 


i i i $$ Thin amavatinn ta 
the operation is performed in DEC format. This operation is 





4-32 









description of its use appears in Appendix C. 


The integer-to-floating-point conversion (INT-TO-FP) opera- 
tion takes a 32-bit, two's-complement integer on port R and 
places the equivalent floating-point value on port F. The 


used to support Newton-Raphson floating-point division; a 





operand on port S is not used in this operation; its value will 
not affect the operation in any way. In IEEE mode (IEEE/ 
DEC = HIGH) the result is delivered in IEEE format; in DEC 
mode (IEEE/DEC = LOW) the result is delivered in DEC 
format. 


TABLE 1. ALU OPERATION SELECT 





|(2 MINUS S) 
(INT-TO-FP) 
(FP-TO-INT) 
(IEEE-TO-DEC) 


| (DEC-TO-IEEE) 


The floating-point-to-integer conversion (FP-TO-INT) opera- 
tion takes a floating-point number on port R and places the 
equivalent 32-bit, two's-complement integer value on port F. 
The operand on port S is not used in this operation; its value 
will not affect the operation in any way. In IEEE mode (IEEE/ 
DEC = HIGH) the operand on port R is interpreted using the 
IEEE floating-point format; in DEC mode (IEEE/DEC = LOW) 
it is interpreted using the DEC floating-point format. 


The IEEE-to-DEC conversion operation (IEEE-TO-DEC) takes 
an IEEE-format floating-point number on port R and places the 
equivalent DEC-format floating-point number on port F. The 
operand on port S is not used in this operation; its value will 
not affect the operation in any way. The operation can be 
performed in either IEEE mode (IEEE/DEC = HIGH) or DEC 
mode (IEEE/DEC = LOW). 


The DEC-to-IEEE conversion operation (DEC-TO-IEEE) takes 
a DEC-format floating-point number on port R and places the 
equivalent IEEE-floating-point number on port F. The operand 
on port S is not used in this operation; its value will not affect 
the operation in any way. The operation can be performed in 
either IEEE mode (IEEE/DEC = HIGH) or DEC mode (IEEE/ 
DEC = LOW). 


Status Flag Generator 


The status flag generator controls the state of six flags that 
report the status of floating-point ALU operations. The flags 
indicate when an operation is invalid (e.g., °° times 0) or when 
an operation has produced an overflow, an underflow, a non- 
numerical result (e.g., a NAN- or DEC-reserved operand), an 
inexact result, or a result of zero. The flags represent the 
status of the most recently performed operation. Flag status is 
stored in the flag status register on the LOW-to-HIGH transi- 
tion of CLK. When the output register feedthrough control FT 4 
is HIGH, the flag status register is made transparent. 


Poe fn Tt peration Output Equation 


Floating-point addition (R PLUS S) 
Floating-point subtraction (R MINUS S) 
Floating-point multiplication (R TIMES S) 
Floating-point constant subtraction 


Integer-to-floating-point conversion 
Floating-point-to-integer conversion 
IEEE-TO-DEC format conversion 


DEC-TO-IEEE format conversion 


ll 


+ 


” 
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F (floating-point) = R (integer) 









F (integer) = R (floating-point) 









F (DEC format) =R (IEEE format) 










F (IEEE format) = R (DEC format) 


Data Path 


The 32-bit data path consists of the R and S input buses; the F 
output bus; data registers R, S, and F; the register R input 
multiplexer; and the ALU port S input multiplexer. 


Input operands enter the floating-point processor through the 
32-bit R and S input buses, Ro - R31 and So ~ S31. Results of 
operations appear on the 32-bit F bus, Fo -F3 1. The F bus 
assumes a high-impedance state when output enable OE is 
HIGH. 


The R and §S registers store input operands; the F register 
stores the final result of the floating-point ALU operation. Each 
register has an independent clock enable (ENR, ENS, and 
ENF). When a register's clock enable is LOW, the register 
stores the data on its input at the LOW-to-HIGH transition of 
CLK; when the clock enable is HIGH, the register retains its 
current data. All data registers are fully edge-triggered — both 
the input data and the register enable need only meet modest 
setup and hold time requirements. Registers R and S can be 
made transparent by setting FTo, the input register feed- 
through controi, HIGH. Register F can be made transparent by 
setting FT, the output register feedthrough control, HIGH. 


The register R input multiplexer selects either the R input bus 
or the floating-point ALU's F port as the input to register R. 
Selection is controlled by 14 — a LOW selects the R input bus; 
a HIGH selects the ALU F port. The ALU port S input 
multiplexer selects either register S or register F as the input to 
the floating-point ALU's S port. Selection is controlled by lz — 
a LOW selects register S; a HIGH selects register F. 


Data selected by Ig and Iq is described in Table 2. When 
registers R and S are transparent (FTo9 = HIGH), multiplexer 
select 14 must be kept LOW, so that the register R input 
multiplexer selects Ro — R31. When register F is transparent 
(FT4 = HIGH), multiplexer select Ig must be kept LOW, so that 
the ALU port S input multiplexer selects register S. 
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TABLE 2. MUX SELECT 


Data selected for floating-point ALU S port 


rs 

[lo [regwers 
Ge 
re 


Data selected for register R input 


Floating-point ALU port F 


1/0 Modes 


The Am29325 datapath can be configured in one of three I/O 
modes: a 32-bit, two-input bus mode; a 32-bit, single-input bus 
mode; and a 16- bit, two-input bus mode. These modes affect 
only the manner in which data is delivered to and taken from 
the Am29325; operation of the floating-point ALU is not 


altered. The |/O mode is selected with the ONEBUS and S16/_ . 


32 controls. Table 3 lists the control codes needed to invoke 
each 1/O mode. 


2 
R BUS : 


S BUS 3 


t4 Co 


ENRL_) 


CLK LLU 


ONEBUS (=LOW) [> 


$16/32 (=LOW) C 


| 


m 
rs 
n 


ml 


F BUS 





TABLE 3. I/O MODE SELECTION 


$16/32 | ONEBUS 1/0 Mode 


_ 0 
0 
1 
1 


*FTo must be held LOW in this mode (see text). 


32-bit, two-input-bus mode 
32-bit, single-input-bus mode( * ) 
16-bit, two-input-bus mode( * ) 
Illegal 1/0 mode selection value 


32-Bit, Two-Input Bus Mode 


In this 1/0 mode, the R and S buses are configured as 
independent 32-bit input buses, and the F bus is configured as 
a 32-bit output bus. Figure 2 is a functional block diagram of 
the Am29325 in this |/O mode. 


R and S operands are taken from their respective input buses 
and clocked into the R and S registers on the LOW-to-HIGH 
transition of CLK. Register F is also clocked on the LOW-to- 
HIGH transition of CLK. Figure 5(a) depicts typical I/O timing 
in this mode. 








BD007050 


Figure 2. Functional Block Diagram for the 32-Bit, Two-input Bus Mode 
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32-Bit, Single-iInput Bus Mode 


In this |/O mode, the R and S buses are connected to a single 
32-bit multiplexed input data bus; the F bus is configured as an 
independent 32-bit output bus. Figure 3 is a functional block 
diagram of the Am29325 in this |/O mode. Note that both the 
R and S bus lines must be wired to the input bus. 


R and S operands are multiplexed onto the input bus by the 
host system. The S operand is clocked from the input bus into 
a temporary holding register on the HIGH-to-LOW transition of 
CLK and is transferred to register S on the LOW-to-HIGH 


transition of CLK. The R operand is clocked from the input bus 
into register R on the LOW-to-HIGH transition of CLK. Register 
F is clocked on the LOW-to-HIGH transition of CLK. Figure 
5(b) depicts typical I/O timing in this mode. 


When placed in this |/O mode, the data path will not function 
properly if the R and S registers are made transparent. 
Therefore, input register feedthrough control FTo must be held 
LOW in this mode. 





32 
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e CJ ENS 


FLOATING-POINT 
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ALU 
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Am29325 
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Figure 3. Functional Block Diagram for the 32-Bit, Single-input Bus Mode 
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16-Bit, Two-Input Bus Mode 


In this 1/O mode, the R and S buses are configured as 
independent 16-bit input buses, and the F bus is configured as 
a 16-bit output bus. Figure 4 is a functional block diagram of 
the Am29325 in this |/O mode. Note that the 16 least- 
significant bits (LSBs) and 16 most-significant bits (MSBs) of 
the R, S, and F buses must be wired to their respective system 
buses in parallel. 


Thirty-two-bit operands are passed along the 16-bit data 
buses by time-multiplexing the 16 LSBs and 16 MSBs of each 
32-bit word. For the R input bus, the host system multiplexes 
the 16 LSBs and 16 MSBs of the R operand onto the 16-bit R 
bus. The 16 LSBs of the R operand are stored in a temporary 
holding register on the HIGH-to-LOW transition of CLK. The 16 
MSBs are clocked into register R on the LOW-to-HIGH 
transition of CLK; at the same time, the 16 LSBs are 
transferred from the temporary holding register to register R. 
Transfer of data from the S input bus to the S register takes 
place in a similar fashion. Register F is clocked on the LOW- 
to-HIGH transition of CLK. Circuitry internal to the Am29325 
multiplexes data from register F onto the 16-bit output bus by 
enabling the 16 LSBs of the F output bus when CLK is HIGH, 
-and enabling the 16 MSBs of the F output bus when CLK is 
LOW. Figure 5(c) depicts typical I/O timing in this mode. 


When placed in this |/O mode, the data path will not function 
properly if the R and S registers are made transparent. 
Therefore, input register feedthrough control FTo must be held 
LOW in this mode. Caution must also be taken in controlling 
the register R input multiplexer control line, 14, in this I/O 
mode. |4 should be changed only when CLK is HIGH, in 


R BUS 





_ 


— 


ONEBUS (= LOW) 


$16/32 (= HIGH) 


ENF Lud) 


F BUS 


S BUS 16 16 16 Sey) . 
RiéR314Rq Ris So S15 


addition to meeting the setup and hold time requirements 
given in the Switching Characteristics section. 


Operation in IEEE Mode 


When input signal IEEE/DEC is HIGH, the IEEE mode of 
operation is selected. In this mode the Am29325 uses the 
floating-point format set forth in the IEEE Proposed Standard 
for Binary Floating-Point Arithmetic, P754. In addition, the 
IEEE mode complies with most other aspects of single- 
precision floating-point operation outlined in the proposed 
standard — differences are discussed in Appendix A. 


IEEE Floating-Point Format 


The IEEE single-precision floating-point word is 32 bits wide, 
and is arranged in the format shown in Figure 6. The floating- 
point word is divided into three fields: a single-bit sign, an 8-bit 
biased exponent, and a 23-bit fraction. 


The sign bit indicates the sign of the floating-point number's 
value. Non-negative values have a sign of 0; negative values, 
a sign of 1. The value zero may have either sign. 


The biased exponent is an 8-bit unsigned integer field repre- 
senting a multiplicative factor of some power of two. The bias 
value is 127. If, for example, the multiplicative factor for a 
floating-point number is to be 2%, the value of the biased 
exponent would be a + 127; ''a"' is called the true exponent. 


The fraction is a 23-bit unsigned fraction field containing the 
23 LSBs of the floating-point number's 24-bit mantissa. The 
weight of fraction's MSB is 27 '; the weight of the LSB is 2729. 





BD007070 


Figure 4. Functional Block Diagram for the 16-Bit, Two-Input Bus Mode 
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A floating-point number is evaluated or interpreted per the 
following conventions: 


let s= sign bit 
e = biased exponent 
f = fraction 


if e=O0 and f=0...value = (-1)§*(0) (+0, -0) 

if e=O and f #0...value = denormalized number 

if 0 < e < 255...value = (-1)5*(2°- 127)*(1.4) 
(normalized number) 

if e= 255 and f = 0...value = (-1)$*(0%) (+ 00, ~99) 

if e= 255 and f #0...value = not-a-number (NAN) 


Zero: The value zero can have either a positive or negative 
sign. Rules for determining the sign of a zero produced by an 
operation are given in the Sign Bit section. 


Denormalized Number: A denormalized number represents a 
quantity with magnitude less than 2~ 126 but greater than zero. 
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Normalized Number: A normalized number represents a 
quantity with magnitude greater than or equal to 2-126 but 
less than 2128, 


Example 1: 


The number +3.5 can be represented in floating-point 
format as follows: 


+ 3.5=11.12x 2° 
=1.119x2! 


sign = 0 


biased exponent = 149 + 12710 = 12810 
= 100000002 


fraction = 110000000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
4060000046. 














c) 16-Bit, Two-input-Bus Mode 


Figure 5. Typical Bus Timing for the I/O Modes with FTp = LOW, FT; = LOW 


SIGN BIASEO 
BIT (S) EXPONENT (E) 


FRACTION (F) 


30 29 28 £27 26 25 24 23.022 


BIT NUMBER: 31 





21 20 19 18 4 3 2 1 0 





2-19 9-20 9-21 9-22 9-23 


VALUE = (—1)S (2E-127) (1.F) 


TBO00640 


Figure 6. IEEE Mode Single-Precision Floating-Point Format 


Example 2: 


The number -11.375 can be represented in floating-point 
format as follows: 


~11.375 = -1011.0119x 2° 
= -1.0110119x 29 


sign = 1 
biased exponent = 319 + 12719 = 13019 
= 100000102 


fraction = 011011000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
C136000046. 


SIGN BIASED 
BIT EXPONENT 


Infinity: Infinity can have either a positive or negative sign. 
The way in which infinities are interpreted is determined by the 
state of the projective/affine mode select, PROJ/AFF. 


Not-a-Number: A not-a-number, or NAN, does not represent 
a numeric value, but is interpreted as a signal or symbol. NANs 
are used to indicate invalid operations, and as a means of 
passing process status information through a series of calcula- 
tions. NANs arise in two ways: 1) they can be generated by the 
Am29325 to indicate that an invalid operation has taken place 
(e.g., °° x QO), or 2) be provided by the user as an input 
operand. There are two types of NANs, signalling and quiet 
(see Figure 7 for formats). 


IEEE Mode Integer Format 


Integer numbers are represented as 32-bit, two's-complement 
words (Figure 8 depicts the integer format). The integer word 
can represent a range of integer values from 231 tg 291 _ 4, 


FRACTION 


31 30 29 28 27 26 25 24 23 22 21 #20 #19 18 


17 


16 #15 14 #13 «12 0 


30 29 26 27 26 25 24 23 22 21 20 19 18 17 


6 615 4 «13 :«12«~«211 10 9 8 FT 6 &§ 4 38 


|S esos mars pan RA I EN ENE pS St A PES le el TE AEN TT 


X = DON’T CARE 


AT LEAST ONE OF THE 
TWENTY-TWO LSBs OF A QUIET NAN 
MUST BE 1 


TBOO0650 


Figure 7. Signalling and Quiet NAN Formats 


BIT NUMBER: 31 30 29 28 27 


—231 930 929 928 927 926 925 924 


TBOOO0660 


Figure 8. 32-Bit Integer Format 


Operations 


All eight floating-point ALU operations discussed in the 
Functional Description section can be performed in IEEE 
mode. Various exceptional aspects of the R PLUS S, R MINUS 
S, R TIMES S, 2 MINUS S, INT-TO-FP, and FP-TO-INT 
operations for this mode are described below. The IEEE-TO- 
DEC and DEC-TO-IEEE operations are discussed separately 
in the IEEE-TO-DEC AND DEC-TO-IEEE Operations section. 


Operations with NANs: NANSs arise in two ways: 1) they can 
be generated by the Am29325 to indicate that an invalid 
operation has taken place (e.g., °° x 0), or 2) be provided by 
the user as an input operand. There are two types of NANs, 
signalling and quiet (see Figure 7 for formats). 


Signalling NANs set the invalid operation flag when they 
appear as an input operand to an operation. They are useful 
for indicating uninitialized variables, or for implementing user- 
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designed extensions to the operations provided. The ALU 
never produces a signalling NAN as the final result of an 
operation. 


Quiet NANs are generated for invalid operations. When they 
appear as an input operand, they are passed through most 
operations without setting the invalid flag, the floating-point-to- 
. integer conversion operation being the exception. 


The sign of any input operand NAN is ignored. All quiet NANs 
produced as the final result of an operation have a sign of 0. 


When a NAN appears as an input operand, the final result of 
the operation is a quiet NAN that is created by taking the input 
NAN and forcing bit 22 LOW and bit 21 HIGH. If an operation 
has two NANs as input operands, the resulting quiet NAN is 
created using the NAN on the R port. 


When a quiet NAN is produced as the final result of an invalid 
operation whose input operand or operands are not NANs, the 
resulting NAN will always have the value 7FA0000046. 


The NAN flag will be HIGH whenever an operation produces a 
NAN as a final result. 


Example 1: 


Suppose the floating-point addition operation is performed 
with the following input operands: 


R port: 3F8000004¢ (1.0*2°) 
S port: 7FC1234546 (signalling NAN) 


Result: The signalling NAN on the S port is converted to a 
quiet NAN by forcing bit 22 LOW and bit 21 HIGH. 
The operation's final result will be 7FA1234546. 
Since one of the two input operands is a signalling 
NAN, the invalid flag will be HIGH; the NAN flag will 
also be HIGH. 


Example 2: 


Suppose the floating-point multiplication operation is per- 
formed with the following input operands: 


R port: FFF1111146 (signalling NAN) 
S port: 7FC222221¢6 (quiet NAN) 


Result: Since both input operands are NANs, the NAN on 
the R port is chosen for output. In addition to forcing 
bit 22 LOW, the sign bit (bit 31) is set LOW (bit 21 is 
already HIGH, and need not be changed). The 
operation's final result will be 7—-B1111146. Since 
one of the two input operands is a signalling NAN, 


the invalid flag is HIGH; the NAN flag will also be 


HIGH. 
Example 3: 


Suppose the floating-point subtraction operation is per- 
formed with the following input operands: 


R port: FF80000146 (quiet NAN) 
S port: 7F80000016 (+ °%) 


Result: To create the final result, the quiet NANs sign bit (bit 
31) is forced LOW and bit 21 is forced HIGH (bit 22 
is already LOW, and need not be changed). The final 
result will be 7FA00001146. The NAN flag will be 
HIGH. 


Operations with Denormalized Numbers: The proposed 
IEEE standard incorporates denormalized numbers to allow a 
means of gradual underflow for operations that produce non- 
zero results too small to be expressed as a normalized 
floating-point number. The Am29325 does not support gradual 
underflow. If a floating-point operation produces a non-zero 
rounded result that is not large enough to be expressed as a 
normalized floating-point number, the final result will be a zero 


of the same sign; the inexact, underflow, and zero flags will be 
HIGH. If an input operand is a denormalized number, the 
floating-point ALU will assume that operand to be a zero of the 
same sign. 


Operations Producing Overflows: If an operation has a finite 
input operand or operands, and if the operation produces a 
rounded result that is too large to fit in the destination format, 
the operation is said to have overflowed. 


A floating-point overflow occurs if an RR PLUS S, R MINUS S, R 
TIMES S, or 2 MINUS S operation with finite input operand(s) 
produces a result which, after rounding, has a magnitude 
greater than or equal to 2128 Positive or negative infinity will 
appear as the final result if the rounded result is positive or 
negative, respectively, and the overflow and inexact flags will 
be HIGH. 


Integer overflow occurs when the floating-point-to-integer 
conversion operation attempts to convert a number which, 
after rounding, is greater than 2°1 - 1 or less than -2°'. The 
final result will be quiet NAN 7FA0000016, and the invalid 
operation and NAN flags will be HIGH. Note that the overflow 
and inexact flags remain LOW for integer overflow. 


Operations Producing Underflows: If an operation produces 
a floating-point rounded result having a magnitude too small to 
be expressed as a normalized floating-point number, but 
greater than zero, that operation is said to have underflowed. 
Underflow occurs when an R PLUS S, R MINUS S, or R 

TIMES S operation produces a result which, after rounding, 
has a magnitude in the range: 


O < magnitude < 27126, 


In such cases, the final result will be +0 (0000000046) if the 
rounded result is non-negative, and -0 (8000000016) if the 
rounded result is negative. The underflow, inexact, and zero 
flags will be HIGH. 


Underfiow does not occur if the destination format is integer. If 
the infinitely precise result of a floating-point-to-integer con- 
version has a magnitude greater than 0 and less than 1, but 
the rounded result is 0, the underflow flag remains LOW. 


Operations with Infinities: In most cases, positive and 
negative infinity are valid inputs for the R PLUS S, R MINUS S, 
R TIMES S, and 2 MINUS S operations. Those cases for which 
infinities are not valid inputs for these operations are listed in 
Table 4. 


Infinities in IEEE mode can be handled either as projective or 
affine. The projective mode is selected when PROJ/AFF is 
HIGH; the affine mode is selected when PROJ/AFF is LOW. 
The only differences between the modes that are relevant to 
Am29325 operation occur during the addition and subtraction 
of infinities: 




























(20) + (©) 








(+ °°) + (+) | Output +°!Output 7FA0000046 
(quiet NAN), set invalid and 
Output -°° 
(quiet NAN), set invalid and 
NAN flags 
NAN flags 
Output 7FA0000046 


NAN flags 
Output +°/Output 7FA0000016 
(quiet NAN), set invalid and 


Affine 7 | | 
Operation Mode Projective Mode 
Output 7FA0000016 
(quiet NAN), set invalid and 
Output -°° 
NAN flags 






(-29) - (+ 29) 
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if an R PLUS S, R MINUS S, or 2 MINUS S operation has 
infinity as an input operand or operands, the final result, if 
valid, is presumed to be exact. For example, adding + °° and 
2.0 will produce a final result of +9; since the result is 
considered exact, the inexact flag remains LOW. 


Invalid Operations: If an input operand is invalid for the 
operation to be performed, that operation is considered 
invalid. When an invalid operation is performed, the floating- 
point ALU produces a quiet NAN as the final result, and the 
invalid operation flag goes HIGH. Table 4 lists the cases for 
which the invalid flag is HIGH in IEEE mode, and the final 
results produced for these operations. 


TABLE 4. IEEE MODE INVALID OPERATIONS 


. Final Result 


Input Operand 
R PLUS S (+ 9°) + (—09) 7FA0000046 
or (—%) + (+ °%) (quiet NAN) 
R PLUS S$ (+ 00) + (+00) 7FA0000016 
(quiet NAN) 


or (—°%°) + (-9°) (Note 1) 
R MINUS $ (+ 00) — (+ 00) 7FA0000016 
or (—%) -(-%) (quiet NAN) 
R MINUS §$ (+ 00) — (— 2) 7FA0000016 
(quiet NAN) 
7FA0000046 


or (-~%) -(+ °°) (Note 1) 
R TIMES S 
(quiet NAN) 
R MINUS S 


(+0) * (+2) 
(Note 2) 
R TIMES S 


or (+0) * (-%) 
or (-0) * (+9) 
or (-0) * (-%) 
12 MINUS S_ |S is a signalling NAN _|(Note 2) 
FP-TO-INT R is a signalling or (Note 2) 
quiet NAN 
| FP-TO-INT R>231_ 4 7FA0000046 
or R< — (294) (quiet NAN) 


R or S is a signalling 
NAN 
Notes: 1. These cases are invalid in projective mode only. 
2. Results for these operations are described in the Operations 
with NANs section. 


The Sign Bit 


For most floating-point operations, the sign bit of the final 
result is unambiguous; i.e., there is only one sign bit value that 
yields a numerically correct result. Operations that produce an 
infinitely precise result of zero, however, present a problem, as 
the IEEE floating-point format allows for representation of both 
+0 and —0. The following rules can be used to determine the 
signs of zero produced in such cases. 


























R PLUS $ 


R PLUS S: The operations + x + (—x) and -x + (+x) produce a 
final result of zero; the sign of the zero is dependent on the 


rounding mode: 
Sign of Final Resuit 
0 f 


Rounding Mode 


Round to nearest 
Round toward -—°° 


Round toward +°° 


Round toward 0 











Operations +0 + (-0) and -0 + (+0) produce @ result of 0, 
with the sign of the result determined by the table above. 


The operation +0 + (+0) produces a final result of +0; the 
operation —0 + (~0) produces a final result of -—0. 


R MINUS S: The operations + x ~ (+x) and -x - (~x) produce a 
final result of zero; the sign of the zero is dependent on the 
rounding mode: 


Rounding Mode Sign of Result 






Round toward 0 


Operations + 0 -— (+0) and —0 - (—0) produce a result of 0, with 
the sign of the result determined by the table above. 


The operation +0-(-0) produces a final result of +0; the 
operation -O0-(+0) produces a final result of -—0. 


R TIMES S: The sign of any multiplication result other than a 
NAN is the exclusive OR of the signs of the input operands. 
Therefore, if x is non-negative, 

+0 times +x produces a final result of +0, 

+0 times -x produces a final result of -0, 

-0 times +x produces a final result of —0, 

-Q times-—x produces a final result of +0. 


2 MINUS S: If S equals 2, the final result is -O for the round 
toward ~°° mode, and +0 for all other rounding modes. 


Rounding 


Rounding is performed whenever an operation produces an 
infinitely precise result that cannot be represented exactly in 
the destination format. For example, suppose a floating-point 
operation produces the infinitely precise result: 


41.10101010101010101010101\01 x 2°. 


In this example, the fraction portion of the mantissa has 25 
bits; the IEEE floating-point format can accommodate only 23. 
The backslash (\) in the mantissa represents the boundary 
between the first 23 bits of the fraction and any remaining bits. 
Rounding is the process by which this result is approximated 
by a representation that fits the destination format. 


There are four rounding modes in IEEE mode: 1) round to 
nearest, 2) round toward +°, 3) round toward -°, and 4) 
round toward 0. The rounding mode is chosen using the 
rounding mode select lines, RNDo and RNDj4. Table 5 lists the 
select states needed to obtain the desired rounding mode. 


TABLE 5. ROUNDING MODE SELECT 


| RNDo Rounding Mode 
| -0——|Round to nearest 
| 4 [Round toward -° 










Round toward +° 
Round toward 0 
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Round to Nearest: In this rounding mode the infinitely precise | Example 2: 
result of an operation is rounded to the closest representation 


that fits in the destination format. If the infinitely precise result i FIQUL O20) HiSHnninttely. Diccise:fesulL onan operations: 


is exactly halfway between two representations, it is rounded 920_9-449-8 = 
to the representation having an LSB of zero. Rounding is 4.11444411111111111111111\0001 x 219 
etal both for floating-point and integer destination This result is rounded to the closest representable floating- 


point value, 

Figure 9 illustrates four examples of the round-to-nearest 920 _ 9-4 = 4.44411111111111111111411 x 219 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- Example 3: 
sented by an ''X'' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. (220 + 2-3 4 2-4) 

= — 1.00000000000000000000001\1 x 22° 


This result is exactly halfway between two representable 
floating-point values. Accordingly, it is rounded to the 


In Figure 9(c), the infinitely precise result of an operation is: 


Example 1: 


In Figure 9(a), the infinitely precise result of an operation is: 


220 + 2-44 2-5= 4,00000000000000000000000\11 x 22° closest representation with an LSB of zero, or 
The result is rounded to the closest representable floating- —(220 + 2*2-%) = — 1.00000000000000000000010 x 27° 


20 -3 20 
2 + 2°" = 1.00000000000000000000001 x 2 In Figure 9(d), the infinitely precise result of an operation is: 


920 + 3*9-3 = 4 90000000000000000000011 x 22° 


This result can be represented exactly in the floating-point 
format, and is left unaltered by the rounding process. 


~(220 — 3+ 2-4) 220 _ 2-4 ROUND TO 220 + 2-3 
—(220 - 274) 



























| \ | | i 
—(220 + 3° 2-3) | (220 + 2-3) | ~(220 _ 2° 2-4) 0 220 2° 2-4 | 220 4 2-3 220 + 3° 2-3 
—(220 + 2° 2-3) ~ (220) a) 220 220 + 9°9-3 


ROUND TO 220 - 2-4 220 + 2-44 2-5 


220 _ 2-44 2-8 
ROUND TO —(220 + 2-3) b) 


~—(220 + 2-3 + 2-4) c) NO CHANGE 


-—-9-___e_____@___ 9-9 -_|__, 9-0-9. -_-__#--____6 ___-f- 


2 20 ! -3 
d) . a4 +3°2 
AF004550 


Figure 9. Floating-Point Rounding Examples for Round-to-Nearest Mode 
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Figure 10 illustrates four examples of the round-to-nearest 
process for operations having an integer destination format. 
The infinitely precise result of an operation is represented by 
an ''X"' on the number line; the black dots on the number line 
indicate those values that can be represented exactly in the 
integer format. 


Example 1: 
In Figure 10(a), the infinitely precise result of an operation is: 
210 _ 2-2 = 00...001111111111.11 


The result is rounded to the closest representable integer 
value, 


210 | 00...010000000000 

Example 2: 
In Figure 10(b), the infinitely precise result of an operation is: 
210 + 20 + 9-3 = 00...010000000001.001 





This result is rounded to the closest representable integer 
value, 


210 + 29 = 00...010000000001 

Example 3: 
In Figure 10(c), the infinitely precise result of an operation is: 
— (210 4 29 + 2-1) =~ 44...104111111110.1 


This result is exactly halfway between two representable 
integer values. Accordingly, it is rounded to the closest 
representation with an LSB of zero, or 


—(219 + 2*20) = 44...101111111110 

Example 4: 
In Figure 10(d), the infinitely precise result of an operation is: 
210 + 3*29 = 00...010000000011 


This result can be represented exactly in the integer format, 
and is left unaltered by the rounding process. 


ROUND TO 210 


(2 +3) = (210+ 2) (210 + 1) —(210) ~(210 - 1) 


° 200 _ 4 210 21044 200 +2 21043 
a) 
2 


0-2-2) ROUND TO 210 + 1 


10 0 ~3 
ROUND TO —(210 + 2) b) See ee 


(\ cleschoveio nessa sceentcancsnsieeneces 


0 


0 
4) 210 + 3+ 20 
AF004560 


Figure 10. Integer Rounding Examples for Round-to-Nearest Mode 
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- Round Toward -°: In this rounding mode the result of an 
operation is rounded to the closest representation that is less 
than or equal to the infinitely precise result, and which fits the 
destination format. Rounding is performed both for Healing: 
point and integer destination formats. 


Figure 11 illustrates four examples of the round toward —°© 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X"' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 1: 
In Figure 11(a), the infinitely precise result of an operation is: 
220 42-442-5= 4,00000000000000000000000\11 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating-point 
i eeaaens 


220 — 4.00000000000000000000000 x 22° 
Example 2: 


In Figure 11(b), the infinitely precise result of an operation is: 












2205-44 98 
1.1944194111111111111111\0001 x 21° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating point 
representation: 


220 o-4 = 4.44411991111111111111111 x 219 
Example 3: 
In Figure 11(c), the infinitely precise result of an operation is: 


-(2% +2°3+974) = 
~1,00000000000000000000001 \1 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating-point 
representation. 


~(220 + 2*2-3) = — 4.00000000000000000000010 x 229 
Example 4: 

In Figure 11(d), the infinitely precise result of an operation is: 

220 + 3*2-3 = 4,00000000000000000000011 x 22° 


This result can be represented exactly in the floating-point 
format, and is left unaltered by the rounding process. 


4 





1920 _ 4+ 9-4 20 _ 9- 
(au 3 2°") 2-2 ROUND TO 220 
—(220 — 2-4) 220 _ 3+ 2-4 
I | | ! | 


| 
~(220 + 3 * 2-3) | ~(2704 2-3 | ~(220 _ 2+ 2-4) 0 220 _ 2-2-4 | 220 + 2-3 | 220 + 3° 2-3 
—(220 + 2* 2-3) —(220) a) 220 220 + 2° 273 


ROUND TO 220 — 2-4 220 4 g-4 4 2-5 
0 
ROUND TO (220 + 2 * 2-3) b) 


pcocsid cd chioesnaticcncsiestusite Beilsiainticbiotastnctioehionsnsine 


0 


220 = 2-4 * 2-8 


~(220 + 2-3 + 2-4) 


c) NO an 
. 0 
220 + 3° 2-3 
d) 
AF004510 


Figure 11. Floating-Point Rounding Examples for Round Toward -~ Mode 
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Figure 12 illustrates four examples of the round toward -—° 
process for operations having an integer destination format. 
The infinitely precise result of an operation is represented by 
an ''X"' on the number line; the black dots on the number line 
indicate those values that can be exactly represented in the 
integer format. 


Example 1: 
In Figure 12(a), the infinitely precise result of an operation is: 
210 _ 9-2 = 00...001111111111.11 


The result is rounded to the next-smaller representable 
integer value, 


210_ 90 = 90...001111111111 

Example 2: 
In Figure 12(b), the infinitely precise result of an operation is: 
210 4 90 + 2-3 = 00...010000000001.001 


This result is rounded to the next-smaller representable 
integer value, 


219 + 29 = 00...010000000001 

Example 3: 
In Figure 12(c), the infinitely precise result of an operation is: 
—(219 + 29 + 2-4) = 44...101111111110.1 


This result is rounded to the next-smaller representable 
integer value: 


—(219 + 2*2%) = 44...1011411111110 

Example 4: 
In Figure 12(d), the infinitely precise result of an operation is: 
210 + 3*29 = 00...01000000001 1 


This result can be represented exactly in the integer format, 
and is unaltered by the rounding process. 


ROUND TO 210 — 4 


—(21 + 3) ~(210 +2) ~(210+ 1) —(210) —(210 ~ 4) 


t | | | | 
210 _ 4 ( 210 210 44 210 +2 21043 
210 _ 9-2 


ROUND TO 210 + 4 


ee eee a ae ee CY ky SY Cm, | OE 


ROUND TO —(210 + 2) 


rr 


210 + 204 2-3 


—(210 + 20 + 2-1) 


NO CHANGE 


cicvisiicesenbenputerntonsontnuniaes 


0 
d) 


210 + 3+ 20 


AF004580 


Figure 12. Integer Rounding Examples for Round Toward -~ Mode 
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Round Toward + °°: In this rounding mode the result of an 920_o-449-8 = 

operation is rounded to the closest representation that is 4.111111141111111111111111\0001 x 219 
greater than or equal to the infinitely precise result, and which 
fits the destination format. Rounding is performed both for 
floating-point and integer destination formats. . 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-larger floating point 
representation: 

Figure 13 illustrates four examples of the round toward sb 920 = 4:00000000000000000000000 x 22° 

process for operations having a floating-point destination 

format. The infinitely precise result of an operation is repre- Example 3: 

sented by an ''X'' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. (220 42-8424) = 

— 1.00000000000000000000001 \1 x 22° 


Pos . a he This result cannot be represented exactly in floating-point 
In Figure 13(a), the infinitely precise result of an operation is: format, and is rounded to the next-larger floating-point 


220 42-44 2-5 1.00000000000000000000000\11 x 22° representation. 


20 -3, — 20 
This result cannot be represented exactly in floating-point — (26° + 2°) = — 1.0000000000000000000001 x 2 
format, and is rounded to the next-larger floating-point Example 4: 
representation: 


220 + 9-3 = 4,90000000000000000000001 x 22° 
Example 2: 


In Figure 13(c), the infinitely precise result of an operation is: 


Example 1: 


In Figure 13(d), the infinitely precise result of an operation is: 
220 + 3*2-3 = 4.90000000000000000000011 x 22°. 


This result can be represented exactly in the floating-point 


In Figure 13(b), the infinitely precise result of an operation is: format —— no rounding takes place 





~(220 _ 3° 2-4) 220 . 2-4 ROUND TO 220 + 2-3 
—(220 a, = | 220 -3° nae | 
| I { I I 


| | | 
—(220 4 3 * 2-3) | —(220 + 2-3) | —(220 - 2* 2-4) 0 220_9+2-4 220 4 9-3 220 + 3+ 2-3 
—(220 + 2* 2-3) —(220) a) 220 220 4 2° 2-3 
ROUND TO 220 220 4 2-4 4 2-5 


ene nee ane enn reese Serta © Sa ee om 


0 
ROUND TO 220 + 2-3 b) 220 _ 2-44 2-8 


0 
—(220 + 2-3 + 2-4 c) NO CHANGE 


; , 
d) 


220+ 3+*2-3 
AF004590 


Figure 13. Floating-Point Rounding Examples for Round Toward +°° Mode 
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Figure 14 illustrates four examples of the round toward +°° This result is rounded to the next-larger representable 
process for having an integer destination format. The infinitely integer value, 

precise result of an operation is represented by an ''X"' on the 10'45-5a50- 5, 

number line; the black dots on the number line indicate those Sere oie ee 


values that can be exactly represented in the integer format. Example 3: 
Example 1: In Figure 14(c), the infinitely precise result of an operation is: 
10 , 50 -1, — 
In Figure 14(a), the infinitely precise result of an operation is: —(2°6 + 2h + 2°) = 114.101111111110.1 
210_ 5-2 = 99. 001111111111.11 This result is rounded to the next-larger representable 


integer value: 
— (219 + 2) = 14...1011111111110 


Example 4: 


The result is rounded to the next-larger representable 
integer value, 


210 — 00...010000000000 
In Figure 14(d), the infinitely precise result of an operation is: 


210 + 3*90 = 00...010000000011 


This result can be represented exactly in the integer 
210 4 29 + 2-3 = 00...010000000001.001 format — no rounding takes place. 


Example 2: 


In Figure 14(b), the infinitely precise result of an operation is: 








ROUND TO 210 









(210 + 3) =(210 + 1) —(210 ~ 1) . 210 _ 4 20044 20042 21043 





—(210 + 2) 





— (210) 





ROUND TO 210 + 2 
ROUND TO —(210 + 1) b) 210 + 204 2-3 


oe 2 es aes 


(210 + 20 4 2-1) NO CHANGE 








0 ‘ 
210 + 3* 20 
d) 


AF004600 


Figure 14. Integer Rounding Examples for Round Toward +°° Mode 
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operation is rounded to the closest representation whose 
magnitude is less than or equal to the infinitely precise result, 
and which fits the destination format. Rounding is performed 
both for floating-point and integer destination formats. 


Figure 15 illustrates four examples of the round toward 0 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X"' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 1: 
In Figure 15(a), the infinitely precise result of an operation is: 


220 4 o-440°5 = 
1,.00000000000000000000000\11 x 22° 


This result cannot be représented exactly in floating-point 
format, and is rounded to: 


229 = 4,00000000000000000000000 x 22° 






~(220 -~ 3° 274) 


—(220 — 2-4) 


{ 
—(220 ~2° 2-4) 
~ (220) 


I { 
| (2204 3* 2-5) —(220 + 2-3) 


—(220 +2* 2-3) 


“Round Toward 0: In this rounding mode the result of an 










Example 2: 


In Figure 15(b), the infinitely precise result of an operation is: 


220_ 9-44 2°8 = 
4.99499999941911111111111\001 x 278 


This result cannot be represented exactly in neeiny: point 
format, and is rounded to: 


220 9-4 we 4.44411119911111119111111 x 279 

Example 3: 
In Figure 15(c), the infinitely precise result of an operation is: 
~(220 + 9-3 +. 9-4) 
~ 1,00000000000000000000001\1 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to: 


~(220 + 2-3) = — 14,00000000000000000000001 x 22° 
Example 4: 


In Figure 15(d), the infinitely precise result of an operation is: 
220 + 3*2-3 = 4.00000000000000000000011 x 22° 


This result can be represented exactly in the floating-point 
format, and is unaffected by the rounding process. 


920 . 9-4 


ROUND TO 220 


| 1 J 
220 _ 2*2-4 | 220 + 2-3 220 43+ 9-3 
220 + 2-44 2-5 


2204 2*2-3 


ROUND TO 220 — 24 





ROUND TO —(220 + 2-3) 


—(220 + 2-3 + 2-4) 





220 ~ 2-4 4 2-8 


NO CHANGE 





220 + 3° 2-3 


AF004610 


Figure 15. Floating-Point Rounding Examples for Round Toward 0 Mode 
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Figure 16 illustrates four examples of the round toward 0 
process for operations having an integer destination format. 
The infinitely precise result of an operation is represented by 
an ''X'' on the number line; the black dots on the number line 
indicate those values that can be exactly represented in the 
integer format. 


Example 1: 
in Figure 16(a), the infinitely precise result of an operation is: 
210_ 9-2 = 00...001111111111.11 
The result is rounded to: 
210 _ 20 = 00...001111111111 
Example 2: 


in Figure 16(b), the infinitely precise result of an operation is: 
210 + 20 + 9-3 = 00...010000000001.001 







| ! a ! | 
~-(2104 3) ~(210 4 2) (210 +. 4) ~ (210) ~(210 — 4) 





The result is rounded to: 
210 + 2° = 00,..010000000001 
Example 3: 
In Figure 16(c), the infinitely precise result of an operation is: 
—(219 + 29 + a1) = 14,..101111111110.1 
The result is rounded to: 
= (219 + 2% mm 44,..101941111111 
Example 4: 
In Figure 16(d), the infinitely precise result of an operation is: 
210 + 3*29 = 00...010000000011 


This result can be represented exactly in the integer format, 
and is unaffected by the rounding process. 


ROUND TO 210 — 14 


! i ! l 
210 _ 4 ( 210 21044 21042 21043 
210 _ 9-2 


ROUND TO 210 + 4 


9 


ROUND TO -—(210 + 1) 







~(210 + 20 4 2-1) 





Fiag Operation 


The Am29325 generates six status flags to monitor floating- 
point processor operation. The following is a summary of flag 
conventions in IEEE mode: 





Invalid Operation Flag: The invalid operation fiag is HIGH 
when an input operand is invalid for the operation to be 
performed. Table 4 lists the cases for which the invalid 
operation flag is HIGH in IEEE mode, and the corresponding 
final result. In cases where the invalid operation flag is HIGH, 
the overflow, underflow, zero, and inexact flags are LOW; the 
NAN flag will be HIGH. 


Overflow Flag: The overflow flag is HIGH if an R PLUS S, R 
MINUS S, R TIMES S, or 2 MINUS §S operation with finite input 
operand(s) produces a result which, after rounding, has a 
magnitude greater than or equal to 2'@8. The final result will 
be +° or —°, 





Underflow Flag: The underflow flag is HIGH if an R PLUS S, 
R MINUS §, or R TIMES S operation produces a result which, 
after rounding, has a magnitude in the range: 

0 < magnitude < 27 126, 





sctieenssthecenstagstetpasmnil coniteticabiansadianseindsendannteenaiah 


Figure 16. Integer Rounding Examples for Round Toward 0 Mode 


210 + 20 4 9-3 


NO CHANGE 


210 + 3+ 20 


AF004620 


The final result will be +O (0000000046) if the rounded result is 
non-negative, and -0 (8000000046) if the rounded result is 
negative. 


Inexact Flag: The inexact flag is HIGH if the final result of an 
R PLUS S, R MINUS §S, R TIMES §S, 2 MINUS S, INT-TO-FP, or 
FP-TO-INT operation is not equal to the infinitely precise 
result. Note that if the underflow or overflow flag is HIGH, the 
inexact flag will also be HIGH. 


Zero Flag: The zero flag is HIGH if the final result of an 
operation is zero. For operations producing an IEEE floating- 
point number, the flag accompanies outputs + 0 (00000000; .¢) 
and —0 (8000000016). For operations producing an integer, 
the flag accompanies the output 0 (0000000046). 


NAN Flag: The NAN flag is HIGH if an R PLUS S, R MINUS S, 


- R TIMES S, 2 MINUS S§, or FP-TO-INT operation produces a 


NAN as a final result. 
Operation in DEC Mode 


When input signal IEEE/DEC is LOW, the DEC mode of 
operation is selected. In this mode the Am29325 uses the 
single-precision floating-point format (floating F) set forth in 
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Digital Equipment Corporation's VAX Architecture Manual. In 
addition, the DEC mode complies with most other aspects of 
single-precision floating-point operation outlined in the manu- 
al — differences are discussed in Appendix B. 


DEC Floating-Point Format 


The DEC single-precision floating-point word is 32 bits wide, 
and is arranged in the format shown in Figure 17. The floating- 
point word is divided into three fields: a single-bit sign, an 8-bit 
biased exponent, and a 23-bit fraction. 


The sign bit indicates the sign of the floating-point number's 
value. Non-negative values have a sign of 0, negative values a 
sign of 1. 


The biased exponent is an 8-bit unsigned integer field repre- 
senting a multiplicative factor of some power of two. The bias 
value is 128. If, for example, the multiplicative factor for a 
floating-point number is to be 24, the value of the biased 
exponent would be a + 128; '‘a" is called the true exponent. 


The fraction is a 23-bit unsigned fractional field containing the 
23 LSBs of the floating-point number's 24-bit mantissa. The 
weight of this field's MSB is 2~2; the weight of the LSB is 2-24. 


A floating-point number is evaluated or interpreted per the 
following conventions: 


let s =sign bit 
e =biased exponent 
f = fraction 


if e=0O and s=0...value =0 

if e=0 and s= 1...value = DEC-reserved operand 
if 0 <e <255...value = (- 1)8*(2°~ 128)*( 44) 
(normalized number) 


Zero: The value zero always has a sign of zero. 


DEC-Reserved Operand: A DEC-reserved operand does not 
represent a numeric value, but is interpreted as a signal or 
symbol. DEC-reserved operands are used to indicate invalid 
operations and operations whose results have overflowed the 
destination format. They may also be used to pass symbolic 
information from one calculation to another. 










SIGN 
BIT (S) 


BIASED 
EXPONENT (E) 


BIT NUMBER: 


Various exceptional aspects of the R PLUS S, R MINUS S, R 
TIMES S, 2 MINUS S, INT-TO-FP, and FP-TO-INT operations 
for this mode are described below. The IEEE-TO-DEC and 
DEC-TO-IEEE operations are discussed separately in the 
IEEE-TO-DEC and DEC-TO-IEEE Operations section. 


Operations with DEC-Reserved Operands: DEC-reserved 
operands arise in two ways: 1) they can be generated by the 
Am29325 to indicate that an invalid operation or floating-point 





VALUE = (—1)S (2-128) (.1F) 


Figure 17. DEC-Mode Floating-Point Format 


Normalized Number: A normalized number represents a 
quantity with magnitude greater than or equal to 2-128 but 
less than 2127, 


Example 1: 


The number +3.5 can be represented in floating-point 
format as follows: 


+3.5=11.19x 2° 
=.1119x 2? 


sign = 0 


biased exponent = 219 + 12849 = 13049 
= 100000102 


fraction = 1 10000000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
4160000046. 


Example 2: 


The number —11.375 can be represented in floating-point 
format as follows: 


~11.375 = ~1011.0119 x 2° 
= —,.10110119x 24 


sign = 1 


biased exponent = 449 + 128409 = 13249 
= 100001002 


fraction = 011011000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
C2360000 46. 


DEC Mode integer Format 


DEC mode integer format is identical to that of the IEEE mode. 
Integer numbers are represented as 32-bit, two's-complement 
words (Figure 8 depicts the integer format). The integer word 
can represent a range of integer values from ~231 to 231 -1. 


Operations 


All eight floating-point ALU operations discussed in the 
General Description section can be performed in DEC mode. 


FRACTION (F) 


2-20 9-21 9-22 9-23 9-24 


TB00067 1 


overflow has taken place, or 2) be provided by the user as an 
input operand. | 


When a DEC-reserved operand appears as an input operand, 
the final result of the operation is the same DEC-reserved 
operand. If an operation has two DEC-reserved operands as 
inputs, the DEC-reserved operand on the R port becomes the 
final result. 


The NAN flag will be HIGH whenever an operation produces a 
DEC-reserved operand as a final result. 


er ce A ce a Oe aE eT 
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Example 1: 


Suppose the floating-point addition operation is performed 
with the following input operands: 


R port: 4080000046 (0.1*2") 
S port: 8001234516 (DEC-reserved operand) 


Result: This operation produces the DEC-reserved operand 
on the S port, 8001234546, as the final result. The 
NAN flag will be HIGH. 


Example 2: 


Suppose the floating-point multiplication operation is per- 
formed with the following input operands: 


R port: 80765432;¢6 (DEC-reserved operand) 
S port: 8000000116 (DEC-reserved operand) 


Result: Since both input operands are DEC-reserved oper- 
ands, the operand on the R port, 8076543246, is the 
final result of the operation. The NAN flag will be 
HIGH. 


Operations Producing Overflows: If an operation produces 
a rounded result that is too large to fit in the the destination 
format, that operation is said to have overflowed. 


A floating-point overflow occurs if an R PLUS S, R MINUS S, R 
TIMES §, or 2 MINUS S operation with finite input ocerand(s) 
produces a result which, after rounding, has a magnitude 
greater than or equal to 2127 The final result in such cases wil 
be DEC-reserved operand 8000000036; the overflow, inexact, 
and NAN flags will be HIGH. 


Integer overflow occurs when the ''floating-point-to-integer'’ 
conversion operation attempts to convert to integer a floating- 
point number which, after rounding, is greater than 231 _ 4 of 
less than -2°'. The final result in such cases will be DEC- 
reserved operand 8000000046; the invalid operation flag will 
be HIGH. Note that the overflow and inexact flags remain 
LOW for integer overflow. 


Operations Producing Underflows: If an operation produces 
a floating-point result which, after rounding, has a magnitude 
too small to be expressed as a normalized floating-point 
number, but greater than 0, that operation is said to have 
underflowed. Underflow occurs when an R PLUS S, R MINUS 
S, or R TIMES S operation produces a result which, after 
rounding, has the magnitude: 


0 < magnitude < 27 128 


The final result in such cases will be 0 (0000000046). The 
underflow, inexact, and zero flags will be HIGH. 


Underflow does not occur if the destination format is integer. If 
the infinitely precise result of a floating-point-to-integer con- 
version has a magnitude greater than O and less than 1, but 
the rounded result is 0, the underflow flag remains LOW. 


Invalid Operations: If an input operand is invalid for the 
operation to be performed, that operation is considered 
invalid. There is only one invalid operation in DEC mode: 
performing a floating-point-to-integer conversion on a value 
too large to be converted to an integer. In this case, the final 
result will be DEC-reserved operand 8000000016, and the 
invalid operation and NAN flags will be HIGH. 


Sign Bit 
For all operations producing a DEC floating-point result, the 


sign bit of the final result is unambiguous; i.e., there is only one 
sign bit value that yields a numerically correct result. 


Rounding 


There are four rounding modes for DEC operation: 1) round to 
nearest, 2) round toward +°, 3) round toward —°, and 4) 
round toward 0. The round toward + °°, round toward —°°, and 
round toward 0 modes are performed in a manner identical to 
that for IEEE operation; refer to the Rounding section under 
Operation in IEEE Mode. The round to nearest mode is 
similar to that for IEEE operation, but differs in one respect: for 
the case in which the infinitely precise result of an operation is 
exactly halfway between two representable values, DEC round 
to nearest mode rounds to the value with the larger magni- 
tude, rather than to the value whose LSB is 0. 


Flag Operation 


The Am29325 generates six status flags to monitor floating- 
point processor operation. The following is a summary of flag 
operation in DEC mode: 


Invalid Operation Flag: The invalid operation flag is HIGH if 
the FP-TO-INT operation is performed on a floating-point 
number too large to be converted to an integer. The final result 
for such an operation will be the DEC-reserved operand 
8000000046. 


Overflow Flag: The overflow flag is HIGH if an R PLUS S, R 
MINUS S, R TIMES S, or 2 MINUS S operation produces a 
result which, after rounding, has a magnitude greater than or 
equal to 2127 The final result will be the DEC-reserved 
operand 8000000046. 


Underflow Flag: The underflow flag is HIGH if an R PLUS S, 
R MINUS S§, or R TIMES S operation produces a result which, 
after rounding, has a magnitude in the range: 


0 < magnitude < 27 126, 


The final result will be 0 (0000000046) in such cases. 


Inexact Flag: The inexact flag is HIGH if the final result of an 
R PLUS S, R MINUS S, R TIMES S, 2 MINUS S, INT-TO-FP, or 
FP-TO-INT operation is not equal to the infinitely precise 
result. Note that if the underflow or overflow flag is HIGH, the 
inexact flag will also be HIGH. 


Zero Flag: The zero flag is HIGH if the final result of an 
operation is 0. For operations producing an integer or a DEC 
floating-point number, the flag accompanies the output 0 
(0000000046). (It should be noted that any operation produc- 
ing a floating-point 0 in DEC mode will output 000000004 ..) 


NAN Flag: The NAN flag is HIGH if an R PLUS S, R MINUS §S, 
R TIMES S, 2 MINUS S, or FP-TO-INT operation produces a 
DEC-reserved operand as the final result. 


IEEE-TO-DEC and DEC-TO-IEEE Operations 


The IEEE-TO-DEC and DEC-TO-IEEE operations are used to 
convert floating-point numbers between the IEEE and DEC 
formats. Both operations work in a manner independent of the 
|EEE/DEC mode control. 


IEEE-TO-DEC Conversion 


The operation converts an IEEE floating-point number to DEC 
floating-point format. Most conversions are exact; in no case 
does the round mode have any effect on the final result. There 
are, however, a few exceptional cases: 


a) If the IEEE floating-point input has a magnitude greater than 
or equal to 2127 it is too large to be represented by a DEC 
floating-point number. The final result will be the DEC- 
reserved operand 8000000046; the overflow, inexact, and 
NAN flags will be HIGH. 
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be exercised during circuit board design and layout, as with 





b) lf the IEEE floating-point input is a NAN, the final result will 





be the DEC-reserved operand 800000006; the invalid and  — any high-performance component. The following is a sug- 
NAN flags will be HIGH. gested layout, but since systems vary widely in electrical 





configuration, an empirical evaluation of the intended layout is 


c) If the IEEE floating-point input is a denormalized number, 
recommended. 


the final result will be a DEC 0 (000000046); the zero flag 


will be HIGH. 
Ge The Voct and GNDT pins, which carry output driver switching 
d) If the IEEE floating-point input is +0Oor -0, the final result currents, tend to be electrically noisy. The VCCE and GNDE 









will be a DEC 0 (000000016); the zero flag will be HIGH. pins, which supply the ECL core of the device, tend to produce 
DEC-TO-IEEE Conversion less noise, and the circuits they supply may be adversely 

; ; . a_i affected by noise spikes on the Voce plane. For this reason, it 
This operation converts a DEC floating-point number to IEEE is best to provide isolation between the Voce and Vccrt pins, 
floating-point format. Most conversions are exact; in no case as well as independent decoupling for each. Isolating the 
does the round mode have any effect on the final result. There GNDE and GNDT pins is not required. 





are, however, a few exceptional cases: 





a) If the DEC floating-point input is not 0, but has a magnitude Printed Circuit-Board Layout Suggestions 
less than 2-126, it is too small to be expressed as a | 
normalized IEEE floating-point number. The final result will 1) Use of a multilayer PC board with separate power ground 
be an IEEE floating-point O having the same sign as the and signal planes is highly recommended. 
input (000000046 for positive inputs and 8000000046 for 
negative inputs); the underflow, inexact, and zero flags will 








2) All Voce and VccT pins should be connected to the Vcc 







Perens plane. VccT pins should be isolated from Voce pins by means 

b) If the DEC floating-point input is a DEC-reserved operand, of a slot cut in the Voce plane (see Figure 18). By physically 
the result will be quiet NAN 7FA000046; the invalid opera- separating the Vcce and VccT pins, coupled noise will be 
tion and NAN flags will be HIGH. reduced. 





c) If the DEC floating-point input is 0, the final result will be 
IEEE floating-point +0 (000000046); the zero flag will be 3) All GNDE and GNDT pins should be connected directly to 


HIGH. the ground plane. 










APPLICATIONS 4) The Voct pins should be decoupled to ground with a 0.1-yF 

: ' ceramic capacitor and a 10-uF electrolytic capacitor, placed 

_ Suggestions for Power and Ground Pin as closely to the Am29325 as is practical. Voce pins should 
Connections be decoupled to ground in a similar manner. A suggested 
The Am29325 operates in an environment of fast signal rise layout is shown in Figure 18. 

times and substantial switching currents. Therefore, care must 
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- Figure 18. Suggested Printed-Circuit Board Layout (Power and Ground Connections) 
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Figure 19. Am29325 Thermal Characteristics (Typical) 
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APPENDIX A 


DIFFERENCES BETWEEN THE IEEE 
PROPOSED STANDARD FOR BINARY 
FLOATING-POINT ARITHMETIC AND THE 
Am29325'S IEEE MODE 


'When operated in IEEE mode, the Am29325 High-Speed 
Floating-Point Processor complies with the single-precision 
portion of the IEEE Proposed Standard for Binary Floating- 
Point Arithmetic (P754, draft 10.0) in most respects. There are, 
however, several differences: 


Denormalized Numbers 


The Am29325 does not handle denormalized numbers. A 
denormalized input will be converted to zero of the same sign 
before the specified operation takes place. The operation 
proceeds in exactly the same manner as if the input were +0 
or -0, producing the same numerical result and flags. 


If the result of an operation, after rounding, has a magnitude 
smaller than 27 1¢ , the result is replaced by a zero of the 
same sign. 


Representation of Overfiows 


In some rounding modes the proposed IEEE standard requires 
that overfiows be represented as the format's most-positive or 
most-negative finite number. In particular: 


-— When rounding toward 0, all overflows should produce a 
result of the largest representable finite number with the 
sign of the intermediate result. 


~When rounding toward -©, all positive overflows should 
produce a result of the largest representable positive finite 
number. 


~ When rounding toward + ©, all negative overflows should 
produce a result of the largest representable negative finite 
number. 


The Am29325, however, always represents positive overflows 
as + °° and negative overflows as —°, regardless of rounding 
mode. 


Projective Mode 


The proposed IEEE standard provides only for an affine mode 
to control the handling of infinities. The Am29325 provides 





APPENDIX B- 


DIFFERENCES BETWEEN DEC VAX AND 
Am29325 DEC MODE 


Operation in DEC mode complies with most aspects of single- 
precision floating-point operation outlined in the Digital Equip- 
ment Corporation's VAX Architecture Manual. However, there 
are some differences that should be noted: 


Format 
The Am29325's DEC format is: 


sign ~bit 31 
exponent -bits 30-23 
mantissa -22-0 


both affine and projective modes; the desired mode can be 
selected by the user. 


Traps 


The proposed IEEE standard stipulates that the user be able 
to request a trap on any exception. The Am29325 does not 
support trapped operation, and behaves as if traps are 
disabled. 


Resetting of Flags 


The proposed IEEE standard states that once an exception 


flag has been set, it is reset only at the user's request. The 
Am29325's flags, however, reflect the status of the most 
recent operation. 


Generation of the Underflow Flag 


The proposed IEEE standard suggests several possible crite- 
ria for determining if underflow occurs. These criteria generate 
underflow flags that differ in subtle ways. The underflow 
criteria chosen for the Am29325 stipulate that underflow 
occurs if: 


a) the rounded result of an operation has a magnitude in the 
range: 


0 < magnitude: < g- 126, 
and 
b) the final result is not equal to the infinitely precise result. 


Since the Am29325 never produces a denormalized number 
as the final result of a calculation, condition (b) is true 
whenever (a) is true. Note then that the operation of the 
Am29325's underflow fiag is somewhat different than that of 
an IEEE standard" system using the same underflow criteria. 
For example, if an operation should produce an infinitely 
precise result that is exactly 27-127 an "IEEE standard” 
system would produce that value as the final result, expressed 
as a denormalized number. Since that system's final result is 
exact, the underflow flag would remain LOW. The Am29325, 
on the other hand, would output zero; since its final result is 
not exact, the underflow flag would be HIGH. 


The VAX format is: 


sign —bit 15 
exponent -14-7 
mantissa -bits 6-0, bits 31-16 


In both cases, fields are listed from MSB to LSB, with bit 31 
the MSB of the 32-bit word. The Am29325's DEC format can 
be converted to VAX format by swapping the 16 LSBs and 16 
MSBs of the 32-bit word. 


Flags vs. Exceptions 


In DEC VAX operation, certain unusual conditions arising 
during system operation may incur an exception, or an 
indication to the operating system that special handling is 
needed. 


The VAX recognizes a number of arithmetic exceptions. The 
following exceptions are relevant to the operations supported 
by the Am29325: 


MS a ag Bi a Ng ag ser ag a ea ee te ol 
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Integer Overflow Trap: indicates that the last operation 
produced an integer overflow. The LSBs of the correct result 
are stored in the destination operand. 


Floating-Point Overflow Trap/Fault: indicates that the last 
operation produced, after normalization and rounding, a float- 
ing-point number with magnitude greater than or equal to gie7 
A trap replaces the destination operand with the DEC- 
reserved operand 80000000}¢; a fault leaves the destination 
Operand unchanged. 


Floating-Point Underflow Trap/Fault: indicates that the last 
operation produced, after normalization and rounding, a float- 
ing-point number with magnitude less than 2-128 A trap 
replaces the destination operand with zero; a fault leaves the 
destination operand unchanged. 


Reserved Operand Fault: indicates that the last operation 
had a reserved operand as an input. The destination operand 
is unchanged. 


The Am29325 does not directly support DEC traps and faults. 
Rather, it indicates unusual conditions by setting one or more 
of the six status flags HIGH. Table D2 describes flag operation 
in DEC mode. 


Integer Overflow 


In cases of integer overflow, the VAX signals the integer 
overflow trap and stores the LSBs of the correct result. The 
Am29325 sets the invalid operation flag and outputs the DEC- 
reserved operand 8000000016. 


APPENDIX C 


PERFORMING FLOATING-POINT DIVISION 
ON THE Am29325 


While the Am29325 does not have a floating-point division 
instruction, it can be used to evaluate reciprocals. The 
division: 

C=A/B 
can then be performed by evaluating: 


C = A*(1/B) 


Only a modest amount of external hardware is needed to 
implement the reciprocal function. 


The technique for calculating reciprocals is based on the 


Newton-Raphson method for obtaining the roots of an equa-. 


tion. The roots of equation: 
F(x) =0 

can be found by iteratively evaluating the equation: 
Xe +4 = Xj) — Fxi)/F'(X) 


The process begins by making a guess as to the value of xj, 
and using this guess or ''seed'' value to perform the first 
iteration. Iterations are continued until the root is evaluated to 
the desired accuracy. The number of iterations needed to 
achieve a given accuracy depends both on the accuracy of the 
seed value and the nature of F(x). 


Now consider the equation: 






F(x) =(1/x) - B 





Floating-Point Underflow/Overflow Operation 


The VAX Architecture Manual specifies the action to be taken 
on the destination operand when floating-point underflow or 
overflow is encountered. The Am29325 has no immediate 
control over this destination operand, as it resides somewhere 
off-chip, either in a register or memory location. This isn't so 
much a difference between the VAX specification and 
Am29325 operation as it is a difference in scope. 


The Am29325 responds to fioating-point underflow by produc- 
ing a final result of 0 (0000000046); the underfiow, inexact, 
and zero flags will be HIGH. It responds to floating-point 
overflow by producing the DEC-reserved operand 8000000016 
as the final result; the overflow, inexact, and NAN flags will be 
HIGH. 


Handling of DEC-Reserved Operands 


If an operation has a DEC-reserved operand as an input, the 
Am29325 will produce that operand as the final result. If an 
operation has two input arguments and both are DEC- 
reserved operands, the operand on port R becomes the final 
result. For the VAX, operations with a DEC-reserved operand 
input or inputs do not modify the destination operand. As 
mentioned above, control of the destination operand is be- 
yond the scope of the Am29325's operation. 


Inexact Flag 


The Am29325 provides an inexact flag to indicate that the final 
result produced by an operation is not equal to the infinitely 
precise result. The VAX does not provide this flag. 





The root of F(x) is 1/B. The reciprocal of B, then, can be found 
by using the Newton-Raphson method to find the root of F(x). 
The iterative equation for finding the root is: 


Xi+1 = Xji- F(xj)/F' (Xj) 
= xj — (1/xj- B)/ - (x) ~? 
= xj (2-B*x) | 


It can be shown that, in order for this iterative equation to 
converge, the seed value xg must fall in the range: 


0<x9 < 2/B if B>O 
or 2/B <x9 <0 if B<0 


For example, if the reciprocal of 3 is to be evaluated, the seed 
value must be between 0 and 2/3. 


The error of xj reduces quadratically; that is, if the error of x; is 
e, the error is reduced to order e@ by the next iteration. The 
number of bits of accuracy in the result, then, roughiy doubles 
after every iteration. While this is only an approximation of the 
actual error produced, it is a handy rule of thumb for 
determining the number of iterations needed to produce a 
result of a certain accuracy, given the accuracy of the seed. 


Example 1: 
Find the reciprocal of 7.25. 
Solution: 
The seed value must fall in the range: 


0 < xq < 2/7.25 
or 0< x9 < .275862 





Suppose Xo is chosen to be .1: 
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Xo (2—B*Xo) 
4(2— (7.25) (.1)) 
1275 


x1 (2-B*x;) : 
.1275(2 ~ (7.25) (.1275)) 
1371421875 


x2 (2~B*xa) 
= 1371421875" 

(2 - (7.25) (.1371421875)) 
= 1379265230 


Iteration 1: x; 





Iteration 2: x2 


Iteration 3: x3 


The actual value of 1/7.25, to ten decimal places, is 
.1379310345. 


The error after each iteration is: 


eration |] Err to Ton Piacoa 
[1 fse76_[-aronai0aes 
2 _|nersaaiers _[-o000resea7o 







Example 2: 
Find the reciprocal of -.3. 
Solution: 
The seed value must fall in the range: 


2/(~.3) < x9 <0 
or ~6.66 < xg < 0 


Suppose xo is chosen to be -2.0: 





Xo (2- B*x) 
- 2.0(2 - (-.3) (—2.0)) 
- 2.8 


Iteration 2: xo = x1 (2-B*x4) 
= —2,8(2 -(~.3) (-2.8)) 
= -3.248 | 
Iteration 3: xg = x9 (2-B*xa) 
= ~3,248(2~(—.3) (-3.248)) 
= —3.3311488 
Iteration 4: x4 = xg (2-B*xs3) 
= ~3.3311488* 
(2~(~.3) (--3.3311488)) 
= -3.333331902 


The actual value of 1/(-.3), to ten decimal places, is 
-3.333333333. 


The error after each iteration is: 


Error to Ten Places | 
0.533333333 
0.085333333 


“Iteration 1: x4 





0.002184533 


0000001431 


In order to implement the Newton-Raphson method on the 
Am29325, some means is needed to generate the seed used 
in the first iteration. One approach is to place a hardware seed 
look-up table between the R bus and the Am29325; see Table 
C1. A more detailed diagram of the look-up table appears in 
Figure C2. 
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TABLE C1. CONTENTS OF THE SEED EXPONENT PROM 


Address (16) Data (16) Address (16) Data (16) 


(Note 1) (Note 1) 
(Note 1) 




















(Note 2) 
(Note 2) 
(Note 2) 


Notes: 1. The reciprocals of these numbers are too large to be represented in the 
selected format. 
2. The reciprocals of these numbers are too small to be represented in 
normalized IEEE format. 
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HARDWARE 


AF004640 


Figure C1. Adding a Hardware Look-Up Table to the Am29325 


The look-up table has two sections: a biased exponent look-up 
PROM, and a fraction look-up PROM. The seed-biased 
exponent look-up table is stored in a 512-by-8-bit PROM. This 
table consists of two sections: the DEC format section (which 
occupies addresses O000-OFF 46), and the IEEE section 
(which occupies addresses 100- 1FF4g. The appropriate 
table will be selected automatically if address line Ag is wired 
to the Am29325's lEEE/DEC pin. The equations implemented 
by these table sections are: 


DEC table: seed biased exponent 
= 25719 —input biased exponent 


IEEE table: seed biased exponent 
| = 253109 —input biased exponent 


Table Ci lists the contents of this PROM. 


The seed fraction look-up table is stored in one or more 
PROMs, the number of PROMs depending on the desired 
accuracy of the seed value. The hardware depicted in Figure 





C2 uses two 4K-by-8-bit PROMs to implement a fraction look- 
up table whose inputs are the 12 MSBs of the input argu- 


ment's fraction. These PROMs output the 16 MSBs of the 


seed's fraction field — the remaining 7 bits of fraction are set 
to 0. The equation implemented in this table is: 
2 


ener -1 
1 + input fraction 
where the value of the input fraction falls in the range 


seed fraction = 


0 <input fraction < 1 


Note that the seed fraction must also be constrained to fall in 
the range 


0 <seed fraction < 1 


Therefore, if the input fraction is 0, the corresponding seed . 
fraction stored in the table must be .111...1119, not 1.02. The 
same seed fraction look-up table may be used for both IEEE 
and DEC formats. Table C2 contains a partial listing for the 
seed fraction look-up table shown in Figure C2. 
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TABLE C2. CONTENTS OF THE SEED FRACTION PROMS 


Address (16) Value of —_ Fraction (10) Value of Seed Fraction (10) 


R BUS 


\EEE/DEC 


5 ocoaas 406 
0.0004882812 
0.0007324219 
0.0009765625 
0.0012207031 
0.0014648438 
0.0017089844 
0.0019531250 
0.0021972656 
0.0024414063 
0.0026855469 
0.0029296875 


0.9975585938 
0.9978027344 
0.9980486750 
0.9982910156 
0.9985351563 
0.9987792969 
0.9990234375 
0.9992675781 
0.9995117188 
0.9997558594 


SIGN 
(R34) 


SEED SIGN SEED EXPONENT 


0.9999999999 
0.99951 18370 
0.9990239150 
0.9985362280 
0.9980487790 
0.9975615710 
0.9970745970 
0.9965878630 
0.9961013650 
0.9956151030 
0.9951290800 


0.9946432920 | 


0.9941577400 


0.0012221950 
0.0010998410 
0.0009775170 
0.0008552230 
0.0007329590 
0.0006107240 
0.0004885200 
0.0003663450 
0.0002442000 
0.0001 220850 





BIASED 
EXPONENT 


(R39— R23) 


Ag 
Am27S815 512 x 8 


A7-Ag 


SEED EXPONENT PROM 





PROM Outputs (16) 
Roo - R15 


R14-R7 


(see text) 











“42 
12 MSBs 
OF FRACTION 
(Ro9—Ry1) 





(2) Am27S43 4K x 8 
SEED FRACTION PROMs 





SEED FRACTION 
AF004631 


Figure C2. The Hardware Look-Up Table 


With the hardware look-up table in place, the reciprocal of 
value B can be calculated with the following series of 
operations: . 


1) Place B on both the R and S buses. The 2: 1 multiplexer at 
the output of the hardware look-up table should select the 
output of the look-up table (see Figure C3-A). 


2) Load the seed value xg into register R and load B into 
register S. Select the R TIMES S operation (see Figure 
C3-B). 


3) Load product B*xo into register F. Select the 2 MINUS S 
operation, and select register F as the input to the ALU S 
port (see Figure C3-C). 


4) Load 2-B*xo into register F. Select the R TIMES S 
operation and select register F as the input to the ALU S 
port (see Figure C3-D). 


5) Load the value x1 (x1 = x9(2 — B*xo)) into registers R and F. 
Select the R TIMES S operation (see Figure C3-E). 


6) Repeat steps 3 through 5 until the result has the accuracy 
desired. 
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REGISTER S 


REGISTER R 


REGISTER F 


| Am29325 | 
Recs cnsnaniss SERENA SRSIIERSIRGS GGGGINGNGG GEMS (SSSSSISSISSSISS A CAL SND GARR CERN ERS 


pcre anisheeecse-cicaoen=nivienyeiscncresanins 


BUS F 
DF006210 


Figure C3-A. Data Flow for Step 1 of the Reciprocal Procedure 
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BUS S 


BUS R 


BUS F 






REGISTER R 
[Xo] 





REGISTER F 


| Am29325 


| CA CE CEE CE Eee woe 


Figure C3-B. Data Flow for Step 2 of the Reciprocal Procedure 


4-61 


DF006220 











So- S31 





[B] 


REGISTER F 
[B * Xo] 


| Am29325 


Leccnes CURSAIETRSE WENERSISED SUSUMU: GOONER GOSURSLAENGNS CRAIC EN CE GS TT VES encase 


BUS F 
DF006230 


Figure C3-C. Data Flow for Step 3 of the Reciprocal Procedure 
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BUS S 


BUS R 


BUS F 





‘T= | | 


REGISTER S 
[B] 












REGISTER R 
[Xy (Xr = Xo (2-B*Xo)] 





Xy (Ky = Xo (2—B*Xo)) 


REGISTER F 
[2-B * Xo] 





Am29325 
es 





Figure C3-D. Data Flow for Step 4 of the Reciprocal Procedure 
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DF006240 














BUS S 







BUS R 
| REGISTER R | 
| [X1 XG = Xo (2-B+X))] | 
| REGISTER F | 

1K (X= Xo (2—B¥Xo))] 
| Am29325 | 
[cession cise: pain edna anemia ne | 
| Fo—Fay 
BUS F 


DF006250 


Figure C3-E. Data Flow for Step 5 of the Reciprocal Procedure 
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A tabular description of the operations above is given in Table and port S. The look-up table produces the value 
C3. The following examples, performed in IEEE format, 03952789149 (3D21E80016). The reciprocal is 
illustrate the process. evaluated using the procedure described above; 
register values for each step are given in Table C4. 
The expected result, to the precision of the float- 
Find the reciprocal of 25.3. ing-point word, is .0395256949 (3D21E5B146). In 
this case the expected result is produced after the 
first iteration. All subsequent iterations produce the 
same result, and are therefore unnecessary. 





Example 1: 


‘Solution: The IEEE floating-point representation for 25.3 is 
41CA666616. The reciprocal process is begun by 
feeding this value to both the seed look-up table 
















TABLE C3. SEQUENCE OF EVENTS FOR EVALUATING RECIPROCALS 


eee | tentg | ts | te [ENR] ENB] ENE Register R — S en F 


c+ tv [xfefefe[x| - 
LT CTE Ie AN aa ee ae ee Pa 
a fewnuss[stx[+[+fo] % |e | 6% 
T+ Tatwess|s[+fo[+[o] % |e | 26% 
[sf rtwess |olx]+ [+] 0 [mexe-sxm] 8 | mexoe-say: 
Te femmssts[x[+[+fo] « |e | om 
C7 [atmess|st+fol+fo] ™ |e | 26% 
Te [rtmessfolx[+][ 1] 0 [arme-axp| 2 [arne-axp| 


X = DON'T CARE 









First 
iteration 






Second 
iteration 














TABLE C4. INPUT BUS AND REGISTER VALUES FOR EXAMPLE 1 


| Clock 
Cycle Register R Register S Register F 
1 3D21E800 41CA6666146 
(.03952789) (25.3) 
3D21E8001g | 41CA666616 
(.03952789) (25.3) 
3D21E8001g | 41CA66661¢6 | 3F8001D3i¢6 
(.03952789) (25.3) (1.0000556) 
3D21E800;g | 41CA666646 | SF7FFC5Ai¢6 
(.03952789) (25.3) (.99984419) 
3D21E5B11g | 41CA6666;6 | 3D21E5Bii¢6 
(.03952569) (25.3) (.03952569) 
ss 3D21E5B11g | 41CA6666;_ | 3F7FFFFFi¢ 
(.03952569) (25.3) (.99999994) 
+ 3D21E5B116 | 41CA6666;6 | 3F8000001¢ 
(.03952569) (25.3) (1.0) 
3D21E5B1ig | 41CA6666;6 | 3D21E5B141¢ 
(.03952569) (25.3) (.03952569) 














«<@- Result of first 
iteration 





«<@- Result of second 
iteration 
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Example 2: 





Find the reciprocal of -.4725. 


Solution: The IEEE floating-point representation for -.4725 
is BEF1EB851,. The reciprocal process is begun 
by feeding this value to both the seed look-up table 
and port S. The look-up table produces the value 
-2.11621094149 (C007700046). The reciprocal is 


evaluated using the procedure described above; 
register values for each step are given in Table C5. 
The expected result, to the precision of the float- 
ing-point word, is —2.11640219 (C007732246). In 
this case the expected result is produced after the 
first iteration. All subsequent iterations produce the 
same result, and are therefore unnecessary. 


TABLE C5. INPUT BUS AND REGISTER VALUES FOR EXAMPLE 2 









C007700046 
(-2.1162109) 


4 6007700046 

| (-2.1162109) 
C007732216 
(~2.116402) 
| 6007732246 
(-2.116402) 
7 C007732216 
7 (-2.116402) 
C007732246 
(-2.116402) 


tear eee © Sli | Register R | Register S |. Register F 
C0077000;g | BEF1EB854¢ 
(-2.1162109) | (-0.4725) 


C007700016 
(-2.1162109) 






BEF1EB8516 
(-0.4725) 


BEF1EB854g6 | 3F7FFA141¢ 


(—0.4725) (0.99990963) 


BEF1EB851_ | 3F8002F64,¢ 
(-0.4725) (1.0000904) 


BEF1EB8516 | C007732216 
(-0.4725) (~2.116402) 


BEF1EB8516 | 3F800000i¢ 
(-0.4725) (1.0) 
BEF1EB85;6 | 3F800000i¢ 
(-0.4725) (1.0) 


BEF1EB8516 | C007732246 
(-0.4725) (- 2.116402) 


«@- Result of first 
iteration 






<i Result of second 
iteration 
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APPENDIX D 
SUMMARY OF FLAG OPERATION 


Tables D1, D2, and D3 summarize flag operation for the IEEE 
mode, the DEC mode, and for the IEEE-TO-DEC and DEC-TO- 
IEEE operations. 


TABLE D1. FLAG SUMMARY FOR IEEE MODE 


ee ee A 


Any operation 
listed in the 
IEEE Invalid 
Operations Table 


R PLUS S$ 
R MINUS S 
R TIMES S 
2 MINUS S 


R PLUS S 
R MINUS S 
R TIMES S$ 


R PLUS S 
R MINUS S 
R TIMES S 
2 MINUS S 
INT-TO-FP 
FP-TO-INT 


R PLUS S 
R MINUS S 
R TIMES S 
2 MINUS S 
INT-TO-FP 
FP-TO-INT 


R PLUS S 
R MINUS S$ 
R TIMES S 
2 MINUS S 
FP-TO-INT 


Notes: INV = Invalid operation flag 
OVF = Overflow flag 
UNF = Underflow flag 

INE = Inexact flag 
_ZER = Zero flag 
NAN = NAN flag 
L=LOW 
H = HIGH 
* = State of flag 
depends on the 
input operands 
and the operation 
performed 
















Input operands are finite 
|rounded result | > 2 128 
















0 <|rounded result] < alee 















Final result does not equal 
infinitely precise result 


















Final result is zero 












Final result is a NAN 
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TABLE D2. FLAG SUMMARY FOR DEC MODE 
[_seweton —[_cenaueney Tov Tove Tee Te aan [nan 
FP-TO-INT Rounded result > 291-4 
or rounded result < —231 
_ | FP-TO-INT Input is a DEC-reserved 
operand 


R PLUS S$ 
R MINUS S | Rounded result | > 2127 
R TIMES S 
2 MINUS S 


R PLUS S$ 
R MINUS S$ 0 <|rounded result| < a7 126 
R TIMES S 


R PLUS S Final result does not equal 
R MINUS S infinitely precise result 
R TIMES S$ 


2 MIMUS S 


INT-TO-FP 
FP-TO-INT 


R PLUS S$ Final result is zero 
R MINUS S 

R TIMES S 

2 MINUS $ 

INT-TO-FP 

FP-TO-INT 


R PLUS S$ Final result is a DEC-reserved 
R MINUS S operand 

R TIMES S 

2 MINUS S 

FP-TO-INT 


Notes: INV = Invalid operation flag 
OVF = Overflow flag. * = State of flag 
UNF = Underflow flag depends on the 
_INE = Inexact flag input operands 
ZER = Zero flag and the operation 
NAN = NAN flag performed 
L=LOW 





TABLE D3. FLAG SUMMARY FOR IEEE-TO-DEC AND DEC-TO-IEEE CONVERSIONS 


[operation [onan [ww | ovr | unr | We] Zen | NaN 
TeeeTODeC | wteann iP Ht ft pt Pt 
peesrooeo iene fe ee 


DEC-TO-IEEE Input is a DEC-reserved operand 


DEC-TO-IEEE 0 <|rounded result| < 27 126 


Pa 
DEC-TO-IEEE Final result is zero H 
IEEE-TO-DEC 






me 
— 


r- 
a a 










Ht 
“ack 
a 


- 





Notes: INV = Invalid operation flag H = HIGH 
OVF = Overflow flag * = State of flag 
UNF = Underflow flag . depends on the 
INE = Inexact flag input operands 
ZER = Zero flag and the operation 
NAN = NAN flag performed 
L=LOW 
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ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 


Storage Temperature -—65 to + 150°C Commercial (C) Devices 
Temperature Under Bias — Tc -55 to +125°C Temperature, Case (Tc) 0 to +85°C 
Supply Voltage to Ground Potential Supply Voltage (Vcc) +4.75 to +5.25 V 
Continuous -0.5 to +7.0 V 
DC Voltage Applied to Outputs 
for HIGH State -0.5 V to +Vcc Max. 
DC Input Voltage -—0.5 to +5.5 V 
DC Output Current, into Outputs 
DC Input Current 


Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


Operating ranges define those limits between which the 
functionality of the device is guaranteed. 


DC CHARACTERISTICS over operating ranges unless otherwise specified 
Parameter Parameter 
<a Description Test Conditions (Note 1) 
Output HIGH Voltage Voc = Min. Volts 
Vin = Vit or Vi 
lIoOH =-1.0 mA 
VOL Ouput LOW Voltage Vcc = Min. Volts 
Vin = Vit or Vi 
lo. = 4.0 mA 
Input HIGH Level Guaranteed Input Logical Volts 
HIGH Voltage for All Inputs 
Input LOW Level Guaranteed Input Logical Volts 
LOW Voltage for All Inputs 
Input Clamp Voltage Voc = Min. Volts 
lin =-18 mA 
Ne Input LOW Current Voc = Max. CLK, S16/32, OE -1.0 |. 
VIN = 0.5 V Others -0.5 
Input HIGH Current Voc = Max. CLK, $16/32, OE 100 
VIN = 2.4 V Others 50 























Ss | HIGH Current Voc = Max. 
ee 5.5 V 


ae F351 Off State (High- Voc = Max. Vo=2.4V P| 50 
lOZL Impedance) Output Current [Vo=05V 
Output Short-Circuit Current Voc = Max. +0.5 V Fo —F31 Outputs | ~15 | -50 | 
(Note 2) 2OPeO Flag Outputs | -15 | -50 | 


‘Power Supply Current COM'L, Gace, + 25°C 1800 ee Typical 


(Notes 3, 4) . 
eee aera L Only | Tc =0 to +85°C amend 
Case Temp. 
To = +85°C 1950 
Case Temp. 


Notes: 1. For conditions shown as Min. or Max., use the appropriate value specified under Operating Ranges for the applicable device type. 
2. Not more than one output shoud be shorted at a time. Duration of the short-circuit test should not exceed one second. 
3. Measured with OE LOW, and with all output bits (Fo-F3; and flag outputs) LOW. 
4. Worst-case Icc applies to cold start at lowest operating temperature. 
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SWITCHING CHARACTERISTICS over operating ranges unless otherwise specified 






| COM'L (Note 2) 


Tc =0 to +85°C Case Temp. 


re oe ret __Ame0925— | Amaeazsa | 
No. | Symbo! Description conditions | _in. | Max. | min, | Max. 
1 tasc Clocked Add, Subtract Time (R PLUS S, 
R MINUS S, 2 MINUS S) 
tcc 


Clocked Conversion Time (INT-TO-FP, 
FP-TO-INT, IEEE-TO-DEC, DEC-TO-IEEE) 


4 |tasuc Unclocked Add, Subtract Time (R, S to F, 110 
Flags) for R PLUS S, R MINUS S, 
and 2 MINUS S Instructions 















tmuC Unclocked Multiply Time (R, S to F, Flags) | FT = HIGH 110 
for R TIMES S instruction FT, = HIGH 
110 


Unclocked Conversion Time (R, S to F, 125 


tcuc 
Flags) for INT-TO-FP, FP-TO-INT, IEEE- 
TO-DEC and DEC-TO-IEEE Instructions 
tPpWL Clock Pulse Width LOW (Note 3) 


tppoF1 Clock to Fo-F31 and Flag Outputs FTo = LOW 110 
FT, = HIGH 


FT, = LOW 





125 


10 
dt 
12 
13 
14 


tppoF2 
tpzL OE Enable Time 
tpLz OE Disable Time _ 

ee tee 


tpzL16 _ | Clock ¢ to Fo-Fi5 Z to LOW $16/32 = HIGH 


ipeliie Enable, 16-Bit |/O Mode Z to HIGH ONEBUS = LOW 
tpLz716 Clock 1 to Fo-Fis LOW to Z 
ipaié Disable, 16-Bit 1/O Mode HIGH TO Z_ 
tpZL.16 Clock | to Fie- Fay Z to LOW $16/32 = HIGH 
teers Enable, 16-Bit |/O Mode Z to HIGH ONEBUS = LOW 
tpLz16 Clock t to Fig — Fa LOW to Z 
tphizi8 Disable,16-Bit 1/0 Mode HIGH to Z 
tsce Register Clock Enable Setup Time FTg = LOW 

FT, = LOW 
tHCE Register Clock Enable Hold Time FTp = LOW 

FT; = LOW 


tsp1 Ro -R31, So—S31 Setup Time (Note 1) FTp = LOW 
tHD1 Ro -R31, So-S31 Hold Time (Note 1) 


tsp2 Ro—-R31, So-S31 Setup Time (Note 1) FTo = HIGH 
oe Ro-R31, So-S31 Hold Time (Note 1) FT; = LOW 


lo-lo Instruction Select Setup Time FT for Destination 
tHio2 | Ig-l2 Instruction Select Hold Time Register = LOW 
lo -—l2 Instruction Select to Fo -F3 1, Flags FT; = HIGH 


1= | 
tsi3 lg Port S Input Select Setup Time FT, =LOW 
tHi3 lg Port S Input Select Hold Time 
tsi4 l4 Register R Input Select Setup Time FTg = LOW 
(Note 1) 
tHi4 l4 Register R Input Select Hold Time 
(Note 1) 


Round Mode Select Setup Time FT for Destination 
tHRM Round Mode Select Hold Time Register = LOW 
tprRF Round Mode Select to Fo-F31, Flags FT, = HIGH 


Notes: 1. See timing diagram for desired mode of operation to determine clock edge to which these setup and hold times apply. 
2. It is the responsibility of the user to maintain a case temperature of 85°C or less. AMD recommends an air velocity of at least 200 linear feet per 
minute over the heat sink. 
3. Tester limitations necessitate this spec limit. Typical value shown is actual worst-case value. 
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SWITCHING TEST CIRCUITS 


S$; 


Vout 





{ 


TC001104 


5.0 - Vee —- VoL 
VOL 


Ri =lol+—— 
io aaa 


A. Three-State Outputs 


Vec 


Ry = 9100 
S; 


Vout o—o" 


TC001084 


24V 
Ro =——_ 
lOH 


5.0 - VBE - VoL 
Ry= Vo 


lo jp 
L Ro 


B. Normal Outputs 


Notes: 1. C, = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S1, Se, S3 are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpzy} test. 
S; and So are closed while Sg is open for tpz, test. 


4. C_ = 5.0 pF for output disable tests. 








SWITCHING TEST WAVEFORMS 


LOW-HIGH-LOW 
PULSE 


OV 
Press _ 
th 
3V 


TIMING HIGH-LOW-HIGH 
INPUT” SSO LN PULSE ~~ 
ov 


WFR02970 | WFR02790 
Notes: 1. Diagram shown for HIGH data only. 
Output transition may be opposite sense. Pulse Width 
2. Cross hatched area is don't care 


condition. 
Enable Disable 


Set-Up, Hold, and Release Times 


CONTROL __ 
INPUT 


a 3 V 
SAME PHASE _ \ jee 
INPUT TRANSITION 
OUTPUT 
ov NORMALLY 
‘eG ei LOW 
— Vou 
ouTPUT. ————____—— —— 1.6V 
OUTPUT 
NORMALLY : 
_ “PLH ae 0.5 V 
OPPOSITE PHASE ___ : wioeney 
INPUT TRANSITION Notes: 1. Diagram shown for Input Control Enable- 
. —— a OV LOW and Input Control Disable-HIGH. 


2. S1, Se and Sg of Load Circuit are closed 


WFR02980 
except where shown. 


Propagation Delay Enable and Disable Times 





Notes on Test Methods 6. Capacitative Loading for AC Testing: Automatic testers and 


their associated hardware have stray capacitance which 
The following points give the general philosophy which we varies from one type of tester to another, but generally 
apply to tests which must be properly engineered if they are to around 50 pF. This, of course, makes it impossible to make 
be implemented in an automatic environment. The specifics of | ~ —_ direct measurements of parameters which call for a smaller 
what philosophies applied to which test are shown. capacitive load than the associated stray capacitance. 


Typical examples of this are the so-called ''float delays," 
which measure the propagation delays in to and out of the 
high-impedance state, and are usually specified at a load 
capacitance of 5.0 pF. In these cases the test is performed 


1. Ensure that the part is adequately decoupled at the test 
head. Large changes in supply current when the device 
switches may cause function failures due to Vcc changes. 


2. Do not leave inputs floating during any tests, as they may at the higher load capacitance (typically 50 pF), and 
oscillate at high frequency. engineering correlations based on data taken with a bench 
set up are used to predict the result at the lower capaci- 
3.Do not attempt to perform threshold tests at high speed. tance. 
Following an input transition, ground current may change by 
as much as 400 mA in 5 to 8 ns. Inductance in the ground Similarly, a product may be specified at more than one 
cable may allow the ground pin at the device to rise by capacitive load. Since the typical automatic tester is not 
hundreds of millivolts momentarily. capable of switching loads in mid-test, it is impossible to 


make measurements at both capacitances even though 
they may both be greater than the stray capacitance. In 
these cases, a measurement is made at one of the two 
capacitances. The result at the other capacitance is 
predicted from engineering correlations based on data 
taken with a bench set up and the knowledge that certain 


5. To simplify failure analysis, programs should be designed to DC measurements (2.9., |OH, !oL) have already been taken 


perform DC, Function, and AC tests as three distinct groups and are within specification. In some cases, special DC 
of tests. tests are performed in order to facilitate this correlation. 


_ 4, Use extreme care in defining input levels for AC tests. Many 
inputs may be changed at once, so there will be significant 
noise at the device pins which may not actually reach Vi, or 
Vi until the noise has settled. AMD recommends using 
Vit SO V and Viy <3 V for AC tests. 
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7. Threshold Testing: The noise associated with automatic of tester limitations. Data input hold times often fall into this 












testing, the long, inductive cables, and the high gain of category. In these cases, the parameter in question is 
bipolar devices when in the vicinity of the actual device guaranteed by correlating tests with other AC tests which 
threshold, frequently give rise to oscillations when testing have been performed. These correlations are arrived at by 
high-speed circuits. These oscillations are not indicative of a the cognizant engineer by using data from precise bench 
reject device, but instead, of an overtaxed test system. To measurements in conjunction with the knowledge that 
minimize this problem, thresholds are tested at least once certain DC parameters have already been measured and 
for each input pin. Thereafter, ''hard'' high and low levels are within specification. 
are used for other tests. Generally this means that function 
and AC testing are performed at ''hard"' input levels rather In some cases, certain AC tests are redundant since they 
than at Vi_ Max. and Vix Min. can be shown to be predicted by other tests which have 
8. AC Testing: Occasionally, parameters are specified which already been performed. In these cases, the redundant 
cannot be measured directly on automatic testers because tests are not performed. 
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Clocked Operation: FTg = LOW 
FT, = LOW 
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SWITCHING WAVEFORMS (Cont'd.) 
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WF023770 


Clocked Operation: FTo = HIGH 
FT; =LOW 
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Clocked Operation: FTg = LOW 
FT, = HIGH 


WF023780 





4-74 


SWITCHING WAVEFORMS (Cont'd.) 
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-Flow-Through Operation (FTp = HIGH, FT; = HIGH) 
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32-Bit, Single-Input Bus Mode 
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SWITCHING WAVEFORMS (Cont'd.) 
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Note 1. 14 has special setup and hold time requirements in this mode. All other control signals have timing 
requirements as shown in the diagram ''Clocked operation, FTg = LOW, FT, = LOW." 


16-Bit, Two-input Bus Mode 


4-76 


OUTPUT ENABLE/DISABLE TIMING 








THREE-STATE | NORMAL. 
DRIVEN INPUT OUTPUT OUTPUT 
Vec 
low ! lon 
| ie aes 
| 7 
fo. | for 
| J 
1IC000960 1C000970 
CLK, 16732, OE 
R =8kQ 
ALL OTHER INPUTS 
R= 16KQ 
C1 =5.0 pF, all inputs Co= 5.0 pF, all outputs 


Note: Actual current flow direction shown. 
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Am29C325 


CMOS 32-Bit Floating-Point Processor 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


Single VLSI device performs high-speed fioating-point 

arithmetic 

— Floating-point addition, subtraction, and multiplication 
in a single clock cycle 

— Internal architecture supports sum-of-products, 
Newton-Raphson division 

@ 32-bit, three-bus flow-through architecture 

- Programmable |/O allows interface to 32- and 16-bit 

systems 


@ IEEE and DEC formats 


— Performs conversions between formats 

- Performs integer <= _ floating-point conversions 
Input and output registers can be made. transparent 
independently 

Pin and functionally compatible with the Bipolar 
Am29325 | | 

The Am29C325 uses less than one-quarter the power of 
the Am29325 

145 PGA requires no heatsink 


GENERAL DESCRIPTION 


The Am29C325 is a high-speed floating-point processor 
unit. It performs 32-bit single-precision floating-point addi- 
tion, subtraction, and multiplication operations in a single 
VLSI circuit, using the format specified by the proposed 
IEEE floating-point standard, 754. The DEC single-preci- 
sion floating-point format is also supported. Operations for 
conversion between 32-bit integer format and floating-point 
format are available, as are operations for converting 
between the IEEE and DEC floating-point formats. Any 
operation can be performed in a single clock cycle. Six 
flags — invalid operation, inexact result, zero, not-a-num- 
ber, overflow, and underflow — monitor the status of opera- 
tions. 


The Am29C325 has a three-bus, 32-bit architecture, with 
two input buses and one output bus. This configuration 


provides high I/O bandwidth, allows access to all buses, 
and affords a high degree of flexibility wnen connecting this 
device in a system. All buses are registered, with each 
register having a clock enable. Input and output registers 
may be made transparent independently. Two other I/O 
configurations, a 32-bit, two-bus architecture and a 16-bit, 
three-bus architecture, are user-selectable, easing inter- 


face with a wide variety of systems. Thirty-two-bit internal 


feedforward datapaths support accumulation operations, 
including sum-of-products and Newton-Raphson division. 


Fabricated using Advanced Micro Devices' 1.2 micron 
CMOS process, the Am29C325 is powered by a single 5- 
volt supply. The device is housed in a 145-lead pin-grid- 
array package. 


Am29C300 FAMILY HIGH-PERFORMANCE SYSTEM BLOCK DIAGRAM 


Am29C331 
16-BiT 
SEQUENCER 


MICROPROGRAM 
MEMORY 


Am29C332 


PIPELINE 32-BIT 
REGISTER ALU 


CONTROL 
SIGNALS 


and me 


Ths GoGumneni coniains information on a product under development at Advanced Micro 


Am29C334 
REGISTER 
FILE 
64 x 18 


Am29C323 
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PARALLEL 
MULTIPLIER 


AF004651 





Publication # Rev. Amendment 


Devices, Inc. The information is intended to help you to evaluate this product. AMD 07783 B /0 


reserves the right to change or discontinue work on this product without notice. 
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RELATED AMD PRODUCTS 
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BLOCK DIAGRAM 


Ro- R3y So- S31 


REGISTER 
R 


PORT R PORT S 


1 ’ 
STATUS 
CLK Cc VA FLOATING-POINT FLAG 


a ac GENERATOR 
SELECT 16 
AND ENABLE [_->—4~— PORT F 
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REGISTER STATUS FLAG 
F REGISTER 


[> INEXACT 
A / [> INVALID 
| | > NAN 
V |_ >> OVERFLOW 
|_ > UNDERFLOW 
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CONNECTION DIAGRAM 
Bottom View 


PGA 


A B Cc D E F G _H J K L M N P R 
14 OBUS OE vec CLK R31 R30 R25 . 


FTO FTI VCC VCC. RNDO NDI R27 ~—-R2B 


GND ENR ENS 16/32 VCC VCC VCC R29 R26 GNO 


GND 


UNFL 


F28 


F25 


Vcc 


VCC 


F16 


F14 


F10 


GND GND GND GND 





F2 GND FO St! 
FY GND _P/AFF. SO 
CD010491 
Key: 16/32 = S16/32 
(/D = lEEE/DEC 
INEX = INEXACT 
INVA = INVALID 


OBUS = ONEBUS 
_ OVFL = OVERFLOW 
P/AFF = PROJ/AFF 
UNFL = UNDERFLOW 


*D4 is an alignment pin (not connected internally). 
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PIN DESIGNATIONS 

(Sorted by Pin No.) 
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PIN DESIGNATIONS (Cont'd.) 





(Sorted by Pin Name) 
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LOGIC SYMBOL 
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ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is 
formed by a combination of: a. Device Number 
b. Speed Option (if applicable) 
. Package Type 
. Temperature Range 
. Optional Processing 


oago 


AM29C325 at G GC B 


. OPTIONAL PROCESSING 
Blank = Standard processing 
B = Burn-in 


. TEMPERATURE RANGE 
C = Commercial (0 to +85°C) Case 


. PACKAGE TYPE 
G = 145-Lead Pin Grid Array without Heatsink 
(CGX145) 





. SPEED OPTION 
z -1= Speed Select 


a. DEVICE NUMBER/DESCRIPTION 
Am29C325 
CMOS 32-Bit Floating-Point Processor 


Valid Combinations 
Am29C325 
GC, GCB 
AM29C325-1 





Valid Combinations 






Valid Combinations list configurations planned to be 
supported in volume for this device. Consult the local AMD 


sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 


products. 
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MILITARY ORDERING INFORMATION 
APL Products 


AMD products for Aerospace and Defense applications are available in several packages and operating ranges. APL (Approved 
Products List) products are fully compliant with MIL-STD-883C requirements. The order number (Valid Combination) for APL 
products is formed by a combination of: a. Device Number 

b. Speed Option (if applicable) 
. Device Class 
. Package Type 
. Lead Finish 


LB Z ae2 
— LEAD FINISH 
C = Gold 
d. PACKAGE TYPE 


Z = 145-Lead Pin Grid Array without Heatsink 
(CGX145) 


oan 


AM29C325 


c. DEVICE CLASS 
/B =Class B 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C325 
CMOS 32-Bit Floating-Point Processor 


Valid Combinations Valid Combinations 
Valid Combinations list configurations planned to be 


supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations or to check for newly released valid 
combinations. 
Group A Tests 
Group A tests consist of Subgroups 
Ve2y35. Fy Be 9, AO; 41; 
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PIN DESCRIPTION 


CLK Clock (Input) 
_ For the internal registers. 


ENF Register F Clock Enable (Input; Active LOW) 
When ENF is LOW, register F is clocked on the LOW-to- 
HIGH transition of CLK. When ENF is HIGH, register F 
retains the previous contents. 


ENR Register R Clock Enable (Input; Active LOW) 
When ENR is LOW, register R is clocked on the LOW-to- 
HIGH transition of CLK. When ENR is HIGH, register R 
retains the previous contents. 


ENS Register S Clock Enable (Input; Active LOW) 
When ENS is LOW, register S is clocked on the LOW-to- 
HIGH transition of CLK. When ENS is HIGH, register S 
retains the previous contents. 


Fo-F31 F Operand Bus (Output) 
Fo is the least-significant bit. 


FTo Input Register Feedthrough Control (Input; 
Active HIGH) 
When FTo is HIGH, registers R and S are transparent. 


FT; Output Register Feedthrough Control (Input; 
Active HIGH) 
When FT is HIGH, register F and the status flag register 
are transparent. 


lo-l2 Operation Select Lines (Input) 
Used to select the operation to be performed by the ALU. 
See Table 1 for a list of operations and the corresponding 
codes. 


lg ALU S Port Input Select (input) 
A LOW on lg selects register S as the input to the ALU S 
port. A HIGH on |3 selects register F as the input to the ALU 
S port. 


l4 Register R Input Select (Input) 
A LOW on lq selects Ro — R31 as the input to register R. A 
HIGH selects the ALU F port as the input to register R. 


IEEE/DEC lIEEE/DEC Mode Select (Input) 
When IEEE/DEC is HIGH, IEEE mode is selected. When 
IEEE/DEC is LOW, DEC mode is selected. 


INEXACT  Inexact Result Flag (Output; Active HIGH) 
A HIGH indicates that the final result of the last operation 
was not infinitely precise, due to rounding. 


INVALID Invalid Operation Flag (Output; Active 
HIGH) 
A HIGH indicates that the last operation performed was 


invalid; e.g., © times 0. 


Definition of Terms 
Affine Mode 


One of two modes affecting the handling of operations on 
infinities — see the Operations with Infinities section under 
Operations in IEEE Mode. 


Biased Exponent 


The true exponent of a floating-point number, plus a constant. 
For IEEE floating-point numbers, the constant is 127; for DEC 
floating-point numbers, the constant is 128. See also True 
Exponent. 


Bus 


Data input or output channel for the floating-point processor. 


NAN Not-a-Number Flag (Output; Active HIGH) 
A HIGH indicates that the final result produced by the last 
operation is not to be interpreted as a number. The output in 
such cases is either an IEEE Not-a-Number (NAN) or a 
DEC-reserved operand. 


OE Output Enable (Input; Active LOW) 
When OE is LOW, the contents of register F are placed on 
Fo -F31. When OE is HIGH, Fo—F3; assume a high- 
impedance state. 


ONEBUS Input Bus Configuration Control (Input) 
A LOW on ONEBUS configures the input bus circuitry for 
two-input bus operation. A HIGH on ONEBUS configures 
the input bus circuitry for single-input bus operation. 


OVERFLOW Overflow Flag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a final 
result that overflowed the floating-point format. 


PROJ/AFF  Projective/Affine Mode Select (Input) 
Choice of projective or affine mode determines the way in 
which infinities are handled in IEEE mode. A LOW on 
PROUJ/AFF selects affine mode; a HIGH selects projective 
mode. 


Ro-R31 R Operand Bus (input) 
Ro is the least-significant bit. 


RNDo, RND; Rounding Mode Selects (Input) 
RNDo and RND, select one of four rounding modes. See 
Table 5 for a list of rounding modes and the corresponding 
control codes. 


So-$31 S Operand Bus (input) 
So is the least-significant bit. 


S$16/32 16- or 32-Bit I/O Mode Select (Input) 

A LOW on S16/32 selects the 32-bit |/O mode; a HIGH 
selects the 16-bit 1/O mode. In 32-bit mode, input and 
output buses are 32 bits wide. In 16-bit mode, input and 
output buses are 16 bits wide, with the least- and most- 
significant portions of the 32-bit input and output words 
being placed on the buses during the HIGH and LOW 
portions of CLK, respectively. 


UNDERFLOW  Underfiow Flag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a 
rounded result that underflowed the floating-point format. 


ZERO Zero Flag (Output; Active HIGH) 
A HIGH indicates that the last operation produced a final 
result of zero. 


DEC-Reserved Operand 


A DEC floating-point number that is interpreted as a symbol 
and has no numeric value. A DEC-reserved operand has a 
sign of 1 and a biased exponent of 0. 


Destination Format 


The format of the final result produced by the floating-point 
ALU. The destination format can be IEEE floating point, DEC 
floating point, or integer. 


Final Result 
The result produced by the floating-point ALU. 
Fraction 


The 23 least-significant bits of the mantissa. 


Infinitely Precise Result 


The result that would be obtained from an operation if both 
exponent range and precision were unbounded. 


Input Operands 


The value or values on which an operation is performed. For 
example, the addition 2 + 3=5 has input operands 2 and 3. 


Mantissa 


The portion of a floating-point number containing the number's 
significant bits. For the floating-point number 1.101 x 2 3 the 
mantissa is 1.101. 


NAN (Not-a-Number) 


An IEEE floating-point number that is interpreted as a symbol, 
and has no numeric value. A NAN has a biased exponent of 
255149 and a non-zero fraction. 


Port 
Data input or output channel for the floating-point ALU. 
Projective Mode 


One of two modes affecting the handling of operations on 
infinities — see the Operations with Infinities section under 
Operation in IEEE Mode. 


Rounded Resuit 


The result produced by rounding the infinitely precise result to 
fit the destination format. 


True Exponent (or Exponent) 


Number representing the power of two by which a floating- 
point number's mantissa is to be multiplied. For the floating- 
point number 1.101 x 27-3) the true exponent is -3. 


FUNCTIONAL DESCRIPTION 
Architecture 


The Am29C325 comprises a high-speed, floating-point ALU, a 
status flag generator, and a 32-bit data path. 


Floating-Point ALU 


The floating-point ALU performs 32-bit floating-point opera- 
tions. It also performs floating-point-to-integer conversions, 
integer-to-floating-point floating-point conversions, and con- 
versions between the IEEE and DEC formats. The ALU has 
two 32-bit input ports, R and S, and a 32-bit output port, F. 


Conceptually, the process performed by the ALU can be 
divided into three stages (see Figure 1). The operation stage 
performs the arithmetic operation selected by the user; the 
output of this section is referred to as the infinitely precise 
result of the operation. The rounding stage rounds the 
infinitely precise result to fit in the destination format; the 
output of this stage is called the rounded result. The last stage 
checks for exceptional conditions. If no exceptional condition 
is found, the rounded result is passed through this stage. If 
some exceptional condition is found (e.g., overflow, underflow, 
or an invalid operation), this section may replace the rounded 
result with another output, such as + %, -°°, a NAN, or a DEC- 


reserved operand. The output of this last stage appears on 
port F, and is called the final result. 


OPERAND R 


| 


OPERAND S 


OPERATION STAGE 


(PERFORMS SELECTED OPERATION) 







————  INFINITELY PRECISE RESULT 


ROUNDING STAGE 
(ROUNDS INFINITELY PRECISE 
RESULT) 


ROUNDED RESULT 


EXCEPTION STAGE 
(CHECKS FOR UNUSUAL CONDITIONS) 


F 


FINAL RESULT 
AF004540 


Figure 1. Conceptual Model of the Process 
Performed by the Floating-Point ALU 


The ALU performs one of eight operations; the operation to be 
performed is selected by placing the appropriate control code 
on lines lp — lo. Table 1 gives the control codes corresponding 
to each of the eight operations. 


The floating-point addition operation (R PLUS S) adds the 
floating-point numbers on ports R and S, and places the 
floating-point result on port F. In IEEE mode (IEEE/ 
DEC = HIGH) the addition is performed in IEEE floating-point 
format; in DEC mode (IEEE/DEC = LOW) the addition is 
performed in DEC format. 


The floating-point subtraction operation (R MINUS S) sub- 
tracts the floating-point number on port S from the floating- 
point number on port R and places the floating-point result on 
port F. In IEEE mode (IEEE/DEC = HIGH) the subtraction is 
performed in IEEE floating-point point format; in DEC mode 
(IEEE/DEC = LOW) the subtraction is performed in DEC 
format. 


The floating-point multiplication operation (R TIMES S) multi- 
plies the floating-point numbers on ports R and S, and places 
the floating-point result on port F. In IEEE mode (IEEE/ 
DEC = HIGH) the multiplication is performed in IEEE floating- 
point format; in DEC mode (IEEE/DEC = LOW) the multiplica- 
tion is performed in DEC format. 


The floating-point constant subtraction (2 MINUS S) operation 
subtracts the floating-point value on port S from 2, and places 
the result on port F. The operand on port R is not used in this 
operation; its value will not affect the operation in any way. In 
IEEE mode (IEEE/DEC = HIGH) the operation is performed in 
IEEE floating-point format; in DEC mode (IEEE/DEC = LOW) 
the operation is performed in DEC format. This operation is 
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used to support Newton-Raphson floating-point division; a 
description of its use appears in Appendix C. 


The integer-to-floating-point conversion (INT-TO-FP) opera- 
tion takes a 32-bit, two's-complement integer on port R and 
places the equivalent floating-point value on port F. The 





(2 MINUS §) 
(INT-TO-FP) 
(FP-TO-INT) 
(IEEE-TO-DEC) 


(DEC-TO-IEEE) 


The floating-point-to-integer conversion (FP-TO-INT) opera- 
tion takes a floating-point number on port R and places the 
equivalent 32-bit, two's-complement integer value on port F. 
_ The operand on port S is not used in this operation; its value 
will not affect the operation in any way. In IEEE mode (IEEE/ 
DEC = HIGH) the operand on port R is interpreted using the 
IEEE floating-point format; in DEC mode (IEEE/DEC = LOW) 
it is interpreted using the DEC floating-point format. 


The IEEE-to-DEC conversion operation (IEEE-TO-DEC) takes 
an IEEE-format floating-point number on port R and places the 
equivalent DEC-format floating-point number on port F. The 
operand on port S is not used in this operation; its value will 
not affect the operation in any way. The operation can be 
performed in either IEEE mode (IEEE/DEC = HIGH) or DEC 
mode (IEEE/DEC = LOW). 


The DEC-to-IEEE conversion operation (DEC-TO-IEEE) takes 
a DEC-format floating-point number on port R and places the 
equivalent IEEE-floating-point number on port F. The operand 
on port S is not used in this operation; its value will not affect 
the operation in any way. The operation can be performed in 
either IEEE mode (IEEE/DEC = HIGH) or DEC mode (IEEE/ 
DEC = LOW). 


Status Flag Generator 


The status flag generator controls the state of six flags that 
report the status of floating-point ALU operations. The flags 
indicate when an operation is invalid (€.g., °° times 0) or when 
an operation has produced an overflow, an underflow, a non- 
numerical result (e.g., a NAN- or DEC-reserved operand), an 
inexact result, or a result of zero. The flags represent the 
status of the most recently performed operation. Flag status is 
stored in the flag status register on the LOW-to-HIGH transi- 
tion of CLK. When the output register feedthrough control FT, 
is HIGH, the flag status register is made transparent. 





TABLE 1. ALU OPERATION SELECT 


ee a ——_ a  ———X 


Floating-point addition (R PLUS S) 
Floating-point subtraction (R MINUS S) 
Floating-point multiplication (R TIMES S) 


Floating-point constant subtraction 
Integer-to-floating-point conversion 
Floating-point-to-integer conversion 
IEEE-TO-DEC format conversion 


DEC-TO-IEEE format conversion 


operand on port S is not used in this operation;.its value will 
not affect the operation in any way. In IEEE mode (IEEE/ 
DEC = HIGH) the result is delivered in IEEE format; in DEC 
mode (IEEE/DEC = LOW) the result is delivered in DEC 
format. 
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F (floating-point) = R (integer) 












iF (integer) = R (floating-point) 









F (DEC format) =R (IEEE format) 






F (IEEE format) =R (DEC format) 


Data Path 


The 32-bit data path consists of the R and S input buses: the F 
output bus; data registers R, S, and F; the register R input 
multiplexer; and the ALU port S input multiplexer. 


Input operands enter the floating-point processor through the 
32-bit R and S input buses, Ro —- R31 and So — S31. Results of 
operations appear on the 32-bit F bus, Fo -F3;. The F bus 
assumes a high-impedance state when output enable OE is 
HIGH. 


The R and S registers store input operands; the F register 
stores the final result of the floating-point ALU operation. Each 
register has an independent clock enable (ENR, ENS, and 
ENF). When a register's clock enable is LOW, the register 
stores the data on its input at the LOW-to-HIGH transition of 
CLK; when the clock enable is HIGH, the register retains its 
current data. All data registers are fully edge-triggered — both 
the input data and the register enable need only meet modest 
setup and hold time requirements. Registers R and S can be 
made transparent by setting FTo, the input register feed- 
through control, HIGH. Register F can be made transparent by 
setting FT;, the output register feedthrough control, HIGH. 


The register R input multiplexer selects either the R input bus 
or the floating-point ALU's F port as the input to register R. 
Selection is controlled by 14 — a LOW selects the R input bus; 
a HIGH selects the ALU F port. The ALU port S input 
multiplexer selects either register S or register F as the input to 
the floating-point ALU's S port. Selection is controlled by lz — 
a LOW selects register S; a HIGH selects register F. 


Data selected by lg and Iq is described in Table 2. When 
registers R and S are transparent (FTg = HIGH), multiplexer 
select |4 must be kept LOW, so that the register R input 
multiplexer selects Ro —- R31. When register F is transparent 
(FT; = HIGH), multiplexer select lg must be kept LOW, so that 
the ALU port S input multiplexer selects register S. 











TABLE 2. MUX SELECT 


| ig | Data selected for floating-point ALU s port 
fo [Reiter S$ 


Register F 
Pg Data selected for register R input 


Floating-point ALU port F 
I/O Modes 


The Am29C325 datapath can be configured in one of three I/ 
O modes: a 32-bit, two-input bus mode; a 32-bit, single-input 
bus mode; and a 16- bit, two-input bus mode. These modes 
affect only the manner in which data is delivered to and taken 
from the Am29C325; operation of the floating-point ALU is not 
altered. The I/O mode is selected with the ONEBUS and S16/ 
32 controls. Table 3 lists the control codes needed to invoke 
each i/O mode. 





ENR CT 


CLK UY 


ONEBUS (=LOW) [> 


$16/32 (=LOW) C_> 


m 
n 


32 
F BUS 


TABLE 3. 1/O MODE SELECTION 


i ae a 


32-bit, two-input-bus mode 
32-bit, single-input-bus mode( * ) 
16-bit, two-input-bus mode( * ) 
Illegal |/O mode selection value 


*FTg must be held LOW in this mode (see text). 







FLOATING-POINT 
ALU 


F 


32-Bit, Two-lnput Bus Mode 


In this 1/O mode, the R and S buses are configured as 


independent 32-bit input buses, and the F bus is configured as 
a 32-bit output bus. Figure 2 is a functional block diagram of 
the Am29C325 in this 1/O mode. 


R and S operands are taken from their respective input buses 
and clocked into the R and S registers on the LOW-to-HIGH 
transition of CLK. Register F is also clocked on the LOW-to- 
HIGH transition of CLK. Figure 5(a) depicts typical 1/O timing 
in this mode. 





ue So-S31 


Fo-F3y 


BDO07051 


Figure 2. Functional Block Diagram for the 32-Bit, Two-Iinput Bus Mode 
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32-Bit, Single-Input Bus Mode 


In this |/O mode, the R and S buses are connected to a single 
32-bit multiplexed input data bus; the F bus is configured as an 
independent 32-bit output bus. Figure 3 is a functional block 
diagram of the Am29C325 in this |/O mode. Note that both the 
R and S bus lines must be wired to the input bus. 


R and S operands are multiplexed onto the input bus by the 
host system. The S operand is clocked from the input bus into 
a temporary holding register on the HIGH-to-LOW transition of 
CLK and is transferred to register S on the LOW-to-HIGH 


transition of CLK. The R operand is clocked from the input bus 
into register R on the LOW-to-HIGH transition of CLK. Register 
F is clocked on the LOW-to-HIGH transition of CLK. Figure 
5(b) depicts typical 1/O timing in this mode. 


‘When placed in this |/O mode, the data path will not function 


properly if the R and S registers are made transparent. 
Therefore, input register feedthrough control FTo must be held 
LOW in this mode. 
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Figure 3. Functional Block Diagram for the 32-Bit, Single-input Bus Mode 


16-Bit, Two-Input Bus Mode 


In this 1/O mode, the R and S buses are configured as 
independent 16-bit input buses, and the F bus is configured as 
a 16-bit output bus. Figure 4 is a functional block diagram of 
the Am29C325 in this 1/O mode. Note that the 16 least- 
significant bits (LSBs) and 16 most-significant bits (MSBs) of 
the R, S, and F buses must be wired to their respective system 
buses in parallel. 


Thirty-two-bit operands are passed along the 16-bit data 
buses by time-multiplexing the 16 LSBs and 16 MSBs of each 
32-bit word. For the R input bus, the host system multiplexes 
the 16 LSBs and 16 MSBs of the R operand onto the 16-bit R 
bus. The 16 LSBs of the R operand are stored in a temporary 
holding register on the HIGH-to-LOW transition of CLK. The 16 
MSBs are clocked into register R on the LOW-to-HIGH 
transition of CLK; at the same time, the 16 LSBs are 
transferred from the temporary holding register to register R. 
Transfer of data from the S input bus to the S register takes 
place in a similar fashion. Register F is clocked on the LOW- 
to-HIGH transition of CLK. Circuitry internal to the Am29C325 
multiplexes data from register F onto the 16-bit output bus by 
enabling the 16 LSBs of the F output bus when CLK is HIGH, 
and enabling the 16 MSBs of the F output bus when CLK is 
LOW. Figure 5(c) depicts typical I/O timing in this mode. 


When placed in this |/O mode, the data path will not function 
properly if the R and S registers are made transparent. 
Therefore, input register feedthrough control FTg must be held 
LOW in this mode. Caution must also be taken in controlling 
the register R input multiplexer control line, l4, in this I/O 
mode. !4 should be changed only when CLK is HIGH, in 


addition to meeting the setup and hold time requirements 
given in the Switching Characteristics section. 


Operation in IEEE Mode 


When input signal IEEE/DEC is HIGH, the IEEE mode of 
operation is selected. In this mode the Am29C325 uses the 
floating-point format set forth in the IEEE Proposed Standard 
for Binary Floating-Point Arithmetic, P754. In addition, the 
IEEE mode complies with most other aspects of single- 
precision floating-point operation outlined in the proposed 
standard — differences are discussed in Appendix A. 


IEEE Floating-Point Format 


The IEEE single-precision floating-point word is 32 bits wide, 
and is arranged in the format shown in Figure 6. The floating- 
point word is divided into three fields: a single-bit sign, an 8-bit 
biased exponent, and a 23-bit fraction. 


The sign bit indicates the sign of the floating-point number's 
value. Non-negative values have a sign of 0; negative values, 
a sign of 1. The value zero may have either sign. 


The biased exponent is an 8-bit unsigned integer field repre- 
senting a multiplicative factor of some power of two. The bias 
value is 127. If, for example, the multiplicative factor for a 
floating-point number is to be 2%, the value of the biased 
exponent would be a + 127; ''a'' is called the true exponent. 


The fraction is a 23-bit unsigned fraction field containing the 
23 LSBs of the floating-point number's 24-bit mantissa. The 
weight of fraction's MSB is 2~ |; the weight of the LSB is 2~ 23 
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Figure 4. Functional Block Diagram for the 16-Bit, Two-Input Bus Mode 











A floating-point number is evaluated or interpreted per the 
following conventions: 


let s = sign bit 
e = biased exponent 
f=fraction — 


if e=0O and f =0...value = (-1)§*(0) (+0, —0) 

if e=0O and f #0...value = denormalized number 

if 0 < e < 255...value = (-1)8*(2° - 127)*(1.f) 
(normalized number) 

if e= 255 and f =0...value = (—1)§*(0) (+ 00, —°) 

if e = 255 and f #0...value = not-a-number (NAN) 


Zero: The value zero can have either a positive or negative 
sign. Rules for determining the sign of a zero produced by an 
operation are given in the Sign Bit section. 


Denormalized Number: A denormalized number represents a 
quantity with magnitude less than 27 126 but greater than zero. 


Normalized Number: A normalized number represents a 
quantity with magnitude greater than or equal to 2-126 but 
less than 2128. 


Example 1: 


4-92 


The number + 3.5 can be represented in floating-point 
format as follows: 


+3.5=11.19x 20 
=1.119x2! 


sign = 0 


biased exponent = 149 + 12719 = 128109 
= 100000002 


fraction = 11000000000000000000000>5 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
40600000346. 


XXXXXXXXXX__—= XXX 
~ XXXXXXXXXXX__=_XXXX 
/\ /\ 


a) 32-Bit, Two-lnput-Bus Mode 


WF023730 





c) 16-Bit, Two-input-Bus Mode 


Figure 5. Typical Bus Timing for the I/O Modes with FTg = LOW, FT; =LOW 











BIASED 
EXPONENT (E) 


BIT NUMBER: 


FRACTION (F) 


2-19 9-20 9-21 9-22 9-23 


VALUE = (—1)S (2E-127) (1.F) 


TBO00640 


Figure 6. IEEE Mode Single-Precision Floating-Point Format 


Example 2: 


The number -11.375 can be represented in floating-point 
format as follows: 


-11.375 = -1011.0119 x 2° 
~ 1.0110119x 29 


sign = 1 
biased exponent = 349 + 12749 = 13010 
= 100000102 


fraction = 011011000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
C136000046. 


BIASED 
EXPONENT 


Infinity: Infinity can have either a positive or negative sign. 
The way in which infinities are interpreted is determined by the 
state of the projective/affine mode select, PROJ/AFF. 


Not-a-Number: A not-a-number, or NAN, does not represent 
a numeric value, but is interpreted as a signal or symbol. NANs 
are used to indicate invalid operations, and as a means of 
passing process status information through a series of calcula- 
tions. NANs arise in two ways: 1) they can be generated by the 
Am29C325 to indicate that an invalid operation has taken 
place (e.g., © x QO), or 2) be provided by the user as an input 
operand. There are two types of NANs, signalling and quiet 
(see Figure 7 for formats). 


IEEE Mode Integer Format 


Integer numbers are represented as 32-bit, two's-complement 
words (Figure 8 depicts the integer format). The integer word 
can represent a range of integer values from -231 to 291-4, 





FRACTION 


31 30 29 28 27 26 25 24 23 22 21 20 19 


SIGNALLING NAN 


31. 30 29 28 #27 #26 25 24 23 #22 #21 «#20 #19 #18 «17: «116«15«914~«:132~=«412=~«z Né«tm 9 


| ec cn RAE OPES EAT ASA ASTI EVA ATOM Bt eA ETSI 


QUIET NAN 


X = DON’T CARE 


18 «17 


16615 #14 #13 «12 1 0 


8 7 6 5 4 3 0 


AT LEAST ONE OF THE 
TWENTY-TWO LSBs OF A QUIET NAN 


Figure 7. Signalling and Quiet NAN Formats 


BIT NUMBER: 31 30 29 28 #27 4226 25 = 24 


—~231 930 929 928 927 926 925 924 


eee 


MUST BE 1 
TBO000650 
8 7 6 5 4 3 2 1 0 
28 27 26 25 24 23 22 gt 20 
TBOOO66O 


Figure 8. 32-Bit Integer Format 


Operations 


All eight floating-point ALU operations discussed in the 
Functional Description section can be performed in IEEE 
mode. Various exceptional aspects of the R PLUS S, R MINUS 
S, R TIMES S, 2 MINUS S, INT-TO-FP, and FP-TO-INT 
operations for this mode are described below. The IEEE-TO- 
DEC and DEC-TO-IEEE operations are discussed separately 
in the IEEE-TO-DEC AND DEC-TO-IEEE Operations section. 


4-94 


Operations with NANs: NANs arise in two ways: 1) they can 
be generated by the Am29C325 to indicate that an invalid 
operation has taken place (e.g., °° x 0), or 2) be provided by 
the user as an input operand. There are two types of NANs, 
signalling and quiet (see Figure 7 for formats). 


Signalling NANs set the invalid operation flag when they | 
appear as an input operand to an operation. They are useful 
for indicating uninitialized variables, or for implementing user- 


designed extensions to the operations provided. The ALU 
never produces a signalling NAN as the final result of an 
operation. 


Quiet NANs are generated for invalid operations. When they 
appear as an input operand, they are passed through most 
operations without setting the invalid flag, the floating-point-to- 
integer conversion operation being the exception. 


The sign of any input operand NAN is ignored. All quiet NANs 
produced as the final result of an operation have a sign of 0. 


When a NAN appears as an input operand, the final result of 
the operation is a quiet NAN that is created by taking the input 
NAN and forcing bit 22 LOW and bit 21 HIGH. If an operation 
has two NANSs as input operands, the resulting quiet NAN is 
created using the NAN on the R port. 


When a quiet NAN is produced as the final result of an invalid 
operation whose input operand or operands are not NANSs, the 
resulting NAN will always have the value 7FA0000046. 


The NAN flag will be HIGH whenever an operation produces a 
NAN as a final result. 


Example 1: 


Suppose the floating-point addition operation is performed 
with the following input operands: 


R port: 3F80000046 (1.0*2°) 
S port: 7FC1234516 (signalling NAN) 


Result: The signalling NAN on the S port is converted to a 
quiet NAN by forcing bit 22 LOW and bit 21 HIGH. 
The operation's final result will be 7FA1234546. 
Since one of the two input operands is a signalling 
NAN, the invalid flag will be HIGH; the NAN flag will 
aiso be HIGH. 


Example 2: 


Suppose the floating-point multiplication operation is per- 
formed with the following input operands: 


R port: FFF1111146 (signalling NAN) 
S port: 7—FC222221¢ (quiet NAN) 


Result: Since both input operands are NANs, the NAN on 
the R port is chosen for output. In addition to forcing 
bit 22 LOW, the sign bit (bit 31) is set LOW (bit 21 is 
already HIGH, and need not be changed). The 
operation's final result will be 7FB1111146. Since 
one of the two input operands is a signalling NAN, 
the invalid flag is HIGH; the NAN flag will also be 
HIGH. 


Example 3: 


Suppose the floating-point subtraction operation is per- 
formed with the following input operands: 


R port: FF80000146 (quiet NAN) 
S port: 7F800000;¢6 (+ ©) 


Result: To create the final result, the quiet NANs sign bit (bit 
31) is forced LOW and bit 21 is forced HIGH (bit 22 
is already LOW, and need not be changed). The final 
result will be 7FA0000115. The NAN flag will be 
HIGH. 


Operations with Denormalized Numbers: The proposed 
IEEE standard incorporates denormalized numbers to allow a 
means of gradual underflow for operations that produce non- 
zero results too small to be expressed as a normalized 
floating-point number. The Am29C325 does not support 
gradual underflow. If a floating-point operation produces a 
non-zero rounded result that is not large enough to be 
expressed as a normalized floating-point number, the final 


result will be a zero of the same sign; the inexact, underflow, 
and zero flags will be HIGH. If an input operand is a 
denormalized number, the floating-point ALU will assume that 
operand to be a zero of the same sign. 


Operations Producing Overflows: If an operation has a finite 
input operand or operands, and if the operation produces a 
rounded result that is too large to fit in the destination format, 
the operation is said to have overflowed. 


A floating-point overflow occurs if an R PLUS S, R MINUS §S, R 
TIMES S, or 2 MINUS S operation with finite input operand(s) 
produces a result which, after rounding, has a magnitude 
greater than or equal to 2128 Positive or negative infinity will 
appear as the final result if the rounded result is positive or 
negative, respectively, and the overflow and inexact flags will 
be HIGH. 


Integer overflow occurs when the _ floating-point-to-integer 
conversion operation attempts to convert a number which, 
after rounding, is greater than 2°" -1 or less than -231 The 
final result will be quiet NAN 7FA000004¢6, and the invalid 
operation and NAN flags will be HIGH. Note that the overflow 
and inexact flags remain LOW for integer overflow. 


Operations Producing Underflows: If an operation produces 
a floating-point rounded result having a magnitude too small to 
be expressed as a normalized floating-point number, but 
greater than zero, that operation is said to have underflowed. 
Underflow occurs when an R PLUS S, R MINUS S, or R 
TIMES S operation produces a result which, after rounding, 
has a magnitude in the range: 


0 < magnitude < g- 126 


In such cases, the final result will be +O (Q0000000}46) if the 
rounded result is non-negative, and —O (8000000046) if the 
rounded result is negative. The underflow, inexact, and zero 
flags will be HIGH. 


Underflow does not occur if the destination format is integer. If 
the infinitely precise result of a floating-point-to-integer con- 
version has a magnitude greater than O and less than 1, but 
the rounded result is 0, the underflow flag remains LOW. 


Operations with Infinities: In most cases, positive and 
negative infinity are valid inputs for the R PLUS S, R MINUS S§S, 
R TIMES S, and 2 MINUS S operations. Those cases for which 
infinities are not valid inputs for these operations are listed in 
Table 4. 


Infinities in IEEE mode can be handled either as projective or 
affine. The projective mode is selected when PROJ/AFF is 
HIGH; the affine mode is selected when PROJ/AFF is LOW. 
The only differences between the modes that are relevant to 
Am29C325 operation occur during the addition and subtrac- 


tion of infinities: 
NAN flags 
Output + °° /Output 7FA000001¢ 
(quiet NAN), set invalid and 


Affine 
Operation Mode Projective Mode 
Output 7FA0000014¢ 
(quiet NAN), set invalid and 
Output -—°% 
NAN flags 






















(-2) + (-20 


















(+0) + (+0) | Output + °]Output 7FA000001¢ 
(quiet NAN), set invalid and 
NAN fiags 


(quiet NAN), set invalid and 
Output -°° 

NAN flags 

Output 7FA000001¢ 





(-20) — (+ 2) 
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lf an R PLUS S, R MINUS S, or 2 MINUS S operation has 
infinity as an input operand or operands, the final result, if 
valid, is presumed to be exact. For example, adding + °% and 
2.0 will produce a final result of +°; since the result is — 
considefed exact, the inexact flag remains LOW. 


Invalid Operations: If an input operand is invalid for the 
operation to be performed, that operation is considered 
invalid. When an invalid operation is performed, the floating- 
point ALU produces a quiet NAN as the final result, and the 
invalid operation flag goes HIGH. Table 4 lists the cases for 
which the invalid flag is HIGH in IEEE mode, and the final 
results produced for these operations. 


TABLE 4. IEEE MODE INVALID OPERATIONS 


7FA0000046 
(quiet NAN) 


input Operand 


R PLUS S (+ 00) + (-00) 
| Oe) ate) 
R PLUS S (+09) + (+ 00) 7FA0000046 
or (-°) + (-2%) (Note 1) | (quiet NAN) 


(+ 00) — (+ 00) 7FA0000016 

oe =) (quiet NAN) 
(28) 22) 7FA0000046 

(~°°) —(+°°) (Note 1) | (quiet NAN) 

(+0) * (+9) 7FA0000016 
or (+0) * (-%) (quiet NAN) 


or (-0) * (+) 


or (-0) * (-) 
R or S is a signalling 
NAN 
S) is a signalling NAN \(Note 2) 
_|R is a signalling or (Note 2) 
quiet NAN 
R>231_4 7FA000001¢ 
(quiet NAN) 


1R MINUS S 

or 
1R MINUS S 
| or 


R TIMES S 
R PLUS S 


|R MINUS S 
R TIMES S$ 


2 MINUS $S 
FP-TO-INT 

























| EP-TO-INT 


or R< —(291) 


INotes: 1. These cases are invalid in projective mode only. 
2. Results for these operations are described in the Operations 
with NANs section. 


The Sign Bit 


For most floating-point operations, the sign bit of the final 
result is unambiguous; i,e., there is only one sign bit value that 
yields a numerically correct result. Operations that produce an 
infinitely precise result of zero, however, present a problem, as 
the IEEE floating-point format allows for representation of both 
+0 and -0. The following rules can be used to determine the 
signs of zero produced in such cases. 





R PLUS S: The operations +x + (—x) and —x + (+x) produce a 
_ final result of zero; the sign of the zero is dependent on the 


rounding mode: 
Sign of Final Result 








| Rounding Mode 


Round to nearest 








Round toward -—° 


Round toward + 


Round toward 0 





Operations +0 +(-0) and -O0 + (+0) produce a result of 0, 
with the sign of the result determined by the table above. 


The operation +0 + (+0) produces a final result of +0; the 
operation -—O + (-0) produces a final result of -0. 


R MINUS S: The operations + x — (+x) and -x - (—x) produce a 
final result of zero; the sign of the zero is dependent on the 
rounding mode: 


Rounding Mode | Sign of Result 


Round to nearest 









Operations + 0 - (+0) and -—0 — (-0) produce a result of 0, with 
the sign of the result determined by the table above. 


The operation +0-(-0) produces a final result of +0; the 
operation -O-(+0) produces a final result of -0. 


R TIMES S: The sign of any multiplication result other than a 
NAN is the exclusive OR of the signs of the input operands. 
Therefore, if x is non-negative, 

+0 times +x produces a final result of +0, 

+0 times-x produces a final result of -0, 

-0O times +x produces a final result of —0, 

-0 times—x produces a final result of +0. 


2 MINUS S: If S equals 2, the final result is -O for the round 
toward —°° mode, and +0 for all other rounding modes. 


Rounding 


Rounding is performed whenever an operation produces an 
infinitely precise result that cannot be represented exactly in 
the destination format. For example, suppose a floating-point 
operation produces the infinitely precise result: 


4.10101010101010101010101\01 x 23. 


In this example, the fraction portion of the mantissa has 25 
bits; the IEEE floating-point format can accommodate only 23. 
The backslash (\) in the mantissa represents the boundary 
between the first 23 bits of the fraction and any remaining bits. 
Rounding is the process by which this result is approximated 
by a representation that fits the destination format. 


There are four rounding modes in IEEE mode: 1) round to 
nearest, 2) round toward +°, 3) round toward —°°, and 4) 
round toward 0. The rounding mode is chosen using the 
rounding mode select lines, RNDg and RND}4. Table 5 lists the 
select states needed to obtain the desired rounding mode. 


TABLE 5. ROUNDING MODE SELECT 


: RNDo Rounding Mode 
| 0 [Round to nearest 










Round toward + 
Round toward 0 
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Round to Nearest: in this rounding mode the infinitely precise 
result of an operation is rounded to the closest representation 
that fits in the destination format. If the infinitely precise result 
is exactly halfway between two representations, it is rounded 
to the representation having an LSB of zero. Rounding is 
performed both for floating-point and integer destination 
formats. 


Figure 9 iilustrates four examples of the round-to-nearest 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X'"’ on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 1: 
In Figure 9(a), the infinitely precise result of an operation is: 
220 + 2-449-5= 4.90000000000000000000000\11 x 22° 


The result is rounded to the closest representable floating- 
point value, 


220 40-3 4,00000000000000000000001 x 22° 


— (220 -~3° 2-4) 
—(220 re 2-4) 


\ 
; —(220 -~2° 274) 


\ 
-(220 + 3° 2-3) | ~(220 + 2-3) | 
(220 + 2° 2-3) —(220) 


Example 2: 


In Figure 9(b), the infinitely precise result of an operation is: 
920 _ 9-44 9-8 = 
4.11111111111111111111111\0001 x 219 


This result is rounded to the closest representable floating- 
point value, 


220-4 me 4.44411419199919411111111 x 219 
Example 3: 
In Figure 9(c), the infinitely precise result of an operation is: 


— (220 + 2-3 + 2-4) 
= - 1.00000000000000000000001 \1 x 22° 


This result is exactly halfway between two representable 
floating-point values. Accordingly, it is rounded to the 
closest representation with an LSB of zero, or 


~ (220 + 2*2-3) = — 1.00000000000000000000010 x 22° 
Example 4: 

In Figure 9(d), the infinitely precise result of an operation is: 

220 + 3*9-3 = 4.00000000000000000000011 x 22° 


This result can be represented exactly in the floating-point 
format, and is left unaltered by the rounding process. 


ROUND TO 220 + 2-3 


i | 
220 + 2-3 | 220 + 3° 273 
220 220 4 9+ 9-3 
ROUND TO 220 ~ 2-4 220 4 9-4 4 2-5 


nae te ep 


ROUND TO -(220 + 2-3) 


—(220 + 2-3 + 2-4) 


NO CHANGE 


rnsemtinertiiehoshonnsataifalneeitinmiorieo Nk 


0 
qd) 


220 + 3° 2-3 
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Figure 9. Floating-Point Rounding Examples for Round-to-Nearest Mode 
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Figure 10 illustrates four examples of the round-to-nearest This result is rounded to the closest representable integer 
process for operations having an integer destination format. value, | 
The infinitely precise result of an operation is represented by 10 ') 

: + 2» = 00... 
an "'X"' on the number line; the black dots on the number line : 5 . ep eive donee! 
indicate those values that can be represented exactly in the | Example 3: 


integer format. In Figure 10(c), the infinitely precise result of an operation is: 


Example 1: —(219 4 29 4 2-1) =~ 44...4101111111110.1 | 
In Figure 10(a), the infinitely precise result of an operation is: This result is exactly halfway between two representable 
210_ 9-2 = 99._.001111111111.11 integer values. Accordingly, it is rounded to the closest 


representation with an LSB of zero, or 
— (219 + 2*2) = 44...101111111110 
Example 4: 


The result is rounded to the closest representable integer 
value, 


219 = 00...010000000000 | 
In Figure 10(d), the infinitely precise result of an operation is: 


210 4 3*90 — 90...010000000011 


_ This result can be represented exactly in the integer format, 
210 4 90 + 9-3 = 00,..010000000001.001 and is left unaltered by the rounding process. 


Example 2: 


In Figure 10(b), the infinitely precise result of an operation is: 


ROUND TO 210 
\ ‘| | | I | I | | l | 
—(210 + 3) = -(219 4 2) -(210 + 1) — (210) —(210 — 1) 200 _ 4 210 20044 2022: . 20043 
a 
210 — 2-2 ROUND TO 21 + 1 


a ne 


10 0 -3 
ROUND TO —(210 + 2) b) se Mh 


0 
—(210 4 20 4 2-1) ¢) NO CHANGE 


Q 


o_o _-_-9_-_» ___# —_-_|_ _-_____e ____«__»—___» —_-4f— 


0 
10 . 
d) 210 + 3° 20 
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Figure 10. Integer Rounding Examples for Round-to-Nearest Mode 
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Round Toward -°: In this rounding mode the result of an 
operation is rounded to the closest representation that is less 
than or equal to the infinitely precise result, and which fits the 
destination format. Rounding is performed both for floating- 
point and integer destination formats. 


Figure 11 illustrates four examples of the round toward -° 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X'' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 1: 
In Figure 11(a), the infinitely precise result of an operation is: 
220 + 2-44.2-5= 41,00000000000000000000000\11 x 220 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating-point 
representation: 


220 = 4,00000000000000000000000 x 22° 
Example 2: 


In Figure 11(b), the infinitely precise result of an operation is: 









220_p-440°8= 
4.4499999944114111111111\0001 x 219 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating point 
representation: 


220 _o-4 = 4.44111111111111111111111 x 219 
Example 3: 
In Figure 11(c), the infinitely precise result of an operation is: 


- (220+ 2-3 42-4) = 
~ 1.00000000000000000000001 \1 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-smaller floating-point 
representation. 


~(220 + 9*9-3) = — 14,00000000000000000000010 x 22° 
Example 4: 

In Figure 11(d), the infinitely precise result of an operation is: 

220 + 3*2-3 = 1,00000000000000000000011 x 22° 


This result can be represented exactly in the floating-point 
format, and is left unaltered by the rounding process. 


220 _ 9-4 


ROUND TO 220 
—(220 ~ 2-4) 920 _ 3°2-4 
| i | | 


—(220 + 3° 2-3) | —(220 + 2-3) | 
—(220 + 2° 2-3) —(220) 


~—(220 —~2° 274) 


0 
a) 


| 1 
220 _ 9-2-4 220 , 9-3 
220 220 49+ 2-3 
220 4 2-44 2-5 


220 + 3° 2-3 


ROUND TO 220 — 2-4 


0 i 


ROUND TO —(220 + 2 * 2-3) 


0 
b) 


220 _ 2-44 2-8 


theecsssk Aicpeencacchoeabanadcbsddibeuseet “tineneadntaonlscensnine 


—(220 + 2-3 + 2-4) 


0 
c) 


NO CHANGE 


—_o—____________o _____» _e o_o __,_| o_o 0 o_o _» ___ yf 


0 
d) 


220+ 3° 2-3 
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Figure 11. Floating-Point Rounding Examples for Round Toward -°~° Mode 
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Figure 12 illustrates four examples of the round toward —° 
process for operations having an integer destination format. 
The infinitely precise result of an operation is represented by 
an ‘'X'' on the number line; the black dots on the number line 
indicate those values that can be exactly represented in the 
integer format. 


Example 1: 
In Figure 12(a), the infinitely precise result of an operation is: 
210_ 9-2 = 00...001111111111.11 


The result is rounded to the next-smaller representable 
integer value, 


210_ 90 = 09,.001111111111 

Example 2: 
In Figure 12(b), the infinitely precise result of an operation is: 
210 + 90 + 2-3 = 99...010000000001.001 


—(2 +3) = -(210 + 2) «= -(210 + 4) — (210) —(210 ~ 1) 





This result is rounded to the next-smaller representable 
integer value, 


210 4 59 = 90...010000000001 


Example 3: 







In Figure 12(c), the infinitely precise result of an operation is: 
(21 + 29 + 2-1) = 11...101111111110.1 






This result is rounded to the next-smaller representable 
integer value: | 





— (219 4 9*00) = 44..401111111110 






Example 4: 





In Figure 12(d), the infinitely precise result of an operation is: 
210 + 3*20 = 00...01000000001 1 


This result can be represented exactly in the integer format, 
and is unaltered by the rounding process. 










ROUND TO 210 — 4 


eS ee ee ey Vee ee ae 


! l | | a 
2 210 _ 4 @ 210 21044 210 +2 21043 
2 


0-2-2 ROUND TO 210 + 1 


ROUND TO —(210 + 2) 


a 


0 | 


210 + 20+ 2-3 


ER Mer ae a alegre rere ee 


—(210 + 20 + 2-1) 


NO CHANGE 


ee <a rennrer | 


0 
d) 


210 + 3° 20 
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Figure 12. Integer Rounding Examples for Round Toward -~ Mode 
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Round Toward +: In this rounding mode the result of an 
operation is rounded to the closest representation that is 
greater than or equal to the infinitely precise result, and which 
fits the destination format. Rounding is performed both for 
floating-point and integer destination formats. 


Figure 13 illustrates four examples of the round toward + °° 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X"' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 1: 
In Figure 13(a), the infinitely precise result of an operation is: 
290 +9-442-5= 4.00000000000000000000000\11 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-larger floating-point 
representation: 


220 + 9-3 = 4, 990000000000000000000001 x 22° 
Example 2: 


In Figure 13(b), the infinitely precise result of an operation is: 


2202-4428 = 
4.111111141191111111111111\0001 x 219 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-larger floating point 
representation: 


220 — 1.00000000000000000000000 x 22° 
Example 3: 
In Figure 13(c), the infinitely precise result of an operation is: 


~(2°0 42-3 42-4) = 
~ 1.00000000000000000000001\1 x 22° 


This result cannot be represented exactly in floating-point 
format, and is rounded to the next-larger floating-point 
representation. 


~(229 + 2-3) = ~ 1.0000000000000000000001 x 22° 
Example 4: 

In Figure 13(d), the infinitely precise result of an operation is: 

220 + 3*2~3 = 1.00000000000000000000011 x 22%. 


This result can be represented exactly in the floating-point 
format — no rounding takes place. 


220 — 2-4 





| ROUND TO 220 + 2-3 


—(220 = gee 2-4) 
—(220 - aad | 220 _ 3° — 
i J | 1 | 1 | 
—(220 + 3* 2-3) —(220 + 2-3) —(220 — 2+ 2-4) 0 270_ 2-2-4 270 + 2-3 
220 


~(220 + 2° 2-3) —(220) a) 220 42°23 
20 


ROUND TO 220 220 5 9-4 49-5 
: | x 


0 
20 -3 
ROUND TO 229 + 2 b) 


0 
~(220 + 2-3 4 2-4) c) NO CHANGE 


Siete mhicimavashs bie tileeshacieom north 
a 


220 + 3+ 2-3 
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Figure 13. Floating-Point Rounding Examples for Round Toward +°° Mode 
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Figure 14 illustrates four examples of the round toward + This result is rounded to the next-larger representable 
process for having an integer destination format. The infinitely integer value, 
precise result of an operation is represented by an ''X'' on the 10 4 5*50 

- number line; the black dots on the number line indicate those Bo eee = COO TOON OULD 





values that can be exactly represented in the integer format. Example 3: 
Example 1: In Figure 14(c), the infinitely precise result of an operation is: 
10 0 -1) — 4 
In Figure 14(a), the infinitely precise result of an operation is: —(200 + 20 + 2°) = 11.101911911110.1 
210_9-2.099. 001111114111.11 This result is rounded to the next-larger representable 


integer value: 
-(210 + 20) = 11...1011111111110 


Example 4: 


The result is rounded to the next-larger representable 
integer value, 


210 = 90...010000000000 | 
In Figure 14(d), the infinitely precise result of an operation is: 


210 + 3*99 = 00,..01000000001 1 


This result can be represented exactly in the integer 
210 + 20 + 2-3 = 00,..010000000001.001 format— no rounding takes place. 


Example 2: 


In Figure 14(b), the infinitely precise result of an operation is: 


ROUND TO 210 
1 ! { I { | { | { { \ 
—(210 + 3) -(219 + 2) = (210 + 1) —(210) —(210 — 4) 0 210 4 210 . 21044 21042 21043 
' a is 
) 210 - 2-2 ROUND TO 210 + 2 


pt 


ROUND TO —(210+ 1) b) | 210 4 90 4 2-3 


(> 


o_o {_+ —___ + »_|_ "+ —___»—____ + —__»—___+— 
{ 


~(210 + 20 4 2-1 N 
( + + ) c) O CHANGE 
- 10 
+ 50 
d) 2'%+3°2 
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Figure 14. Integer Rounding Examples for Round Toward +°° Mode 
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Round Toward 0: In this rounding mode the result of an 
operation is rounded to the closest representation whose 
magnitude is less than or equal to the infinitely precise result, 
and which fits the destination format. Rounding is performed 
both for floating-point and integer destination formats. 


Example 2: 
In Figure 15(b), the infinitely precise result of an operation is: 
220_9-44 9-8 — 7 
4.11111111111111111111111\001 x 219 


This result cannot be represented exactly in floating-point 
format, and is rounded to: 


220-4 = 4.44494919199191199191411 x 219 


Figure 15 illustrates four examples of the round toward 0 
process for operations having a floating-point destination 
format. The infinitely precise result of an operation is repre- 
sented by an ''X"' on the number line; the black dots on the 
number line indicate those values that can be represented 
exactly in the floating-point format. 


Example 3: 





In Figure 15(c), the infinitely precise result of an operation is: 


Example 1: —(220 + 9-3 + 2-4) = 


In Figure 15(a), the infinitely precise result of an operation is: 


g20 + 9-44 2°55 
1.00000000000000000000000\11 x 22° 


This result cannot be represented exactly in floating-point 


~1,00000000000000000000001\1 x 229 


This result cannot be represented exactly in floating-point 
format, and is rounded to: 


_(220 + 9-3) = ~ 14.90000000000000000000001 x 22° 





format, and is rounded to: 
220 — 4.00000000000000000000000 x 22° 


Example 4: 
In Figure 15(d), the infinitely precise result of an operation is: | 
220 + 3*2-3 = 4.00000000000000000000011 x 2° | 


This result can be represented exactly in the floating-point 
format, and is unaffected by the rounding process. 


—(220 _- 3+ 2-4) 
—(220 = aia 


{ 
—(220 —2° 2-4) 


| | 
220 4 2-3 | 220 4 3° 2-3 
220+ 2° 2-3 


ROUND TO 220 — 24 220 4 2-44 9-5 | 


0 
ROUND TO —(229 + 2-3) b) 


ae bt Seenae eae Seana aera ee 


0 


1 | 
—(220 + 3* 2-3) | —(220 + 2-3) | 
—(220 + 2° 2-3) ~ (220) 


—(220 + 2-3 + 2-4) NO CHANGE 


220 + 3° 2-3 
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Figure 15. Floating-Point Rounding Examples for Round Toward 0 Mode 
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The result is rounded to: 
210 + 99 = 90...010000000001 


Figure 16 illustrates four examples of the round toward 0 
process for operations having an integer destination format. 
The infinitely precise result of an operation is represented by 
an ''X'' on the number line; the black dots on the number line 
indicate those values that can be exactly represented in the 
integer format. 





Example 3: | 

In Figure 16(c), the infinitely precise result of an operation is: 
Example 1: — (219 + 20 4 2-1) = 44...104111111110.1 
The result is rounded to: 


—(219 + 2) = 44...401141111111 


In Figure 16(a), the infinitely precise result of an operation is: 
2'0_9-2 = 00,..001111111111.11 
The result is rounded to: 
210 _ 90 = 00...001111111111 
Example 2: 


Example 4: 
In Figure 16(d), the infinitely precise result of an operation is: 


210 + 3*00 = 00,..010000000011 







In Figure 16(b), the infinitely precise result of an operation is: 


This result can be represented exactly in the integer format, 
210 4 20 + 2-3 = 00...010000000001.001 


and is unaffected by the rounding process. 


ROUND TO 210 ~ 4 


-(21 + 3) ~(2% + 2) -(210 + 1) ~ (2%) 


a) 


1 l l | 
-(210 — 4) 0 210. 4 ( 210 20044 20042 2043 
2 


10 _ a 
0-2-2 ROUND TO 210 + 4 


0 


ROUND TO -(210 + 1) b) 210 4 90 4 2-3 
it) 
—(210 + 20 4 2-1) c) NO CHANGE 
0 
d) 210 + 3+ 20 
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Figure 16. Integer Rounding Examples for Round Toward 0 Mode 


Flag Operation 


The Am29C325 generates six status flags to monitor floating- 
point processor operation. The following is a summary of flag 
conventions in IEEE mode: 


invalid Operation Flag: The invalid operation flag is HIGH 
when an input operand is invalid for the operation to be 
performed. Table 4 lists the cases for which the invalid 
operation flag is HIGH in IEEE mode, and the corresponding 
final result. in cases where the invalid operation flag is HIGH, 
the overflow, underflow, zero, and inexact flags are LOW; the 
NAN flag will be HIGH. 


Overflow Flag: The overflow flag is HIGH if an R PLUS S, R 
MINUS S, R TIMES §S, or 2 MINUS S operation with finite input 
operand(s) produces a result which, after rounding, has a 
magnitude greater than or equal to 2128 The final result will 
be + or -©, 


Underflow Flag: The underflow flag is HIGH if an R PLUS S, 
R MINUS S, or R TIMES S operation produces a result which, 
after rounding, has a magnitude in the range: 

0 < magnitude < 27 126. 


The final result will be +O (0000000046) if the rounded result is 
non-negative, and —O (8000000046) if the rounded result is 
negative. 


Inexact Flag: The inexact flag is HIGH if the final result of an 
R PLUS S, R MINUS S, R TIMES S, 2 MINUS S, INT-TO-FP, or 
FP-TO-INT operation is not equal to the infinitely precise 
result. Note that if the underflow or overflow flag is HIGH, the 
inexact flag will also be HIGH. . 


Zero Flag: The zero flag is HIGH if the final result of an 
operation is zero. For operations producing an JEEE floating- 
point number, the flag accompanies outputs + 0 (0000000046) 
and ~0 (8000000046). For operations producing an integer, 
the flag accompanies the output 0 (00000000 4g). 


NAN Flag: The NAN flag is HIGH if an R PLUS S, R MINUS §S, 
R TIMES §S, 2 MINUS S, or FP-TO-INT operation produces a 
NAN as a final result. 


Operation in DEC Mode 


When input signal IEEE/DEC is LOW, the DEC mode of 
operation is selected. In this mode the Am29C325 uses the 
single-precision floating-point format (floating F) set forth in 
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Digital Equipment Corporation's VAX Architecture Manual. In 
addition, the DEC mode complies with most other aspects of 
single-precision floating-point operation outlined in the manu- 
al — differences are discussed in Appendix B. 


DEC Floating-Point Format 


The DEC single-precision floating-point word is 32 bits wide, 
and is arranged in the format shown in Figure 17. The floating- 
point word is divided into three fields: a single-bit sign, an 8-bit 
biased exponent, and a 23-bit fraction. 


The sign bit indicates the sign of the floating-point number's 
value. Non-negative values have a sign of 0, negative values a 
sign of 1. 


The biased exponent is an 8-bit unsigned integer field repre- 
senting a multiplicative factor of some power of two. The bias 
value is 128. If, for example, the multiplicative factor for a 
floating-point number is to be 2%, the value of the biased 
exponent would be a + 128; ''a"' is called the true exponent. 


The fraction is a 23-bit unsigned fractional field containing the 
23 LSBs of the floating-point number's 24-bit mantissa. The 
weight of this field's MSB is 27; the weight of the LSB is 2724. 


A floating-point number is evaluated or interpreted per the 
following conventions: 


let s =sign bit 
e = biased exponent 
f = fraction 


ife=0 and s=0...value = 0 

if e=0 and s = 1...value = DEC-reserved operand 
if 0<@ <255...value = (~ 1)8*(2~ 128)*( 44) 
(normalized number) 


Zero: The value zero always has a sign of zero. 


DEC-Reserved Operand: A DEC-reserved operand does not 
represent a numeric value, but is interpreted as a signal or 
symbol. DEC-reserved operands are used to indicate invalid 
operations and operations whose results have overflowed the 
destination format. They may also be used to pass symbolic 
information from one calculation to another. 






SIGN BIASED 
BIT (S) EXPONENT (E) 


BIT NUMBER: = 31 30. 29 28 27 26 2 24 = 23 

















Various exceptional aspects of the R PLUS S, R MINUS S, R 
TIMES §S, 2 MINUS §, INT-TO-FP, and FP-TO-INT operations 
for this mode are described below. The IEEE-TO-DEC and 
DEC-TO-IEEE operations are discussed separately in the 
lIEEE-TO-DEC and DEC-TO-IEEE Operations section. 


Operations with DEC-Reserved Operands: DEC-reserved 
operands arise in two ways: 1) they can be generated by the 
Am29325 to indicate that an invalid operation or floating-point 















VALUE = (—1)S (2E-128) (.4F) 


Figure 17. DEC-Mode Floating-Point Format 


quantity with magnitude greater than or equal to 2-128 but 
less than 2127, 


Example 1: 


The number +3.5 can be represented in floating-point 
format as follows: 


+ 3.5 =11.19x 2° 
= .1112x 22 


sign = 0 


biased exponent = 219 + 12849 = 13019 
= 100000109 


fraction = 110000000000000000000002 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
4160000046. 


Example 2: 


The number -11.375 can be represented in floating-point 
format as follows: 


- 11.375 = -1011.0119 x 2° 
= =,10110119x 24 


sign = 1 


biased exponent = 4149 + 12849 = 13240 
= 100001002 


fraction = 011011000000000000000005 
(the leading 1 is implied in the format) 


Concatenating these fields produces the floating-point word 
C236000046. 


DEC Mode Integer Format 


DEC mode integer format is identical to that of the IEEE mode. 
Integer numbers are represented as 32-bit, two's-complement 
words (Figure 8 depicts the integer format). The integer word 
can represent a range of integer values from ~231 to 281. 4, 


Operations 


All eight floating-point ALU operations discussed in the 
General Description section can be performed in DEC mode. 





FRACTION (F) 


21 20 19 18 4 3 2 4 0 


| | a7 2625p 3 p21) QO J a-2 2-3 2-4 9-5 2-6 



















9-20 9-21 9-22 9-23 9-24 


TB000671 





overflow has taken place, or 2) be provided by the user as an 
input operand. 


When a DEC-reserved operand appears as an input operand, 
the final result of the operation is the same DEC-reserved 
operand. If an operation has two DEC-reserved operands as 
inputs, the DEC-reserved operand on the R port becomes the 
final result. 


The NAN flag will be HIGH whenever an operation produces a 
DEC-reserved operand as a final result. 
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Normalized Number: A normalized number represents a 











Example 1: 


Suppose the floating-point addition operation is performed 
with the following input operands: 


R port: 4080000046 (0.1*2!) 
. & port: 8001234546 (DEC-reserved operand) 


Result: This operation produces the DEC-reserved operand 
on the S port, 8001234546, as the final result. The 
NAN flag will be HIGH. 


Example 2: 


Suppose the floating-point multiplication operation is per- 
formed with the following input operands: 


R port: 80765432;¢6 (DEC-reserved operand) 
S port: 80000001;¢5 (DEC-reserved operand) 


Result: Since both input operands are DEC-reserved oper- 
ands, the operand on the R port, 8076543246, is the 
final result of the operation. The NAN flag will be 
HIGH. 


Operations Producing Overflows: If an operation produces 
a rounded result that is too large to fit in the the destination 
format, that operation is said to have overflowed. 


A floating-point overflow occurs if an R PLUS S, R MINUS S, R 
TIMES S, or 2 MINUS S operation with finite input operand(s) 
produces a result wen, parle rounding, has a magnitude 
greater than or equal to 2127 The final result in such cases will 
be DEC-reserved operand 8000000046; the overflow, inexact, 
and NAN flags will be HIGH. 


Integer overflow occurs when the ''floating-point-to-integer'’ 
conversion operation attempts to convert to integer a Enoetng 
point number which, after rounding, is greater than 231 _ 4 of 
less than -2°'. The final result in such cases will be DEC- 
reserved operand 80000000j¢6; the invalid operation flag will 
be HIGH. Note that the overflow and inexact flags remain 
LOW for integer overflow. 


Operations Producing Underflows: If an operation produces 
a floating-point result which, after rounding, has a magnitude 
too small to be expressed as a normalized floating-point 
number, but greater than 0, that operation is said to have 
underflowed. Underflow occurs when an R PLUS S, R MINUS 
S, or R TIMES S operation produces a result which, after 
rounding, has the magnitude: 


0 < magnitude < 27 128 


The final result in such cases will be 0 (0000000016). The 
underflow, inexact, and zero flags will be HIGH. 


Underflow does not occur if the destination format is integer. If 
the infinitely precise result of a floating-point-to-integer con- 
version has a magnitude greater than 0 and Jess than 1, but 
the rounded result is 0, the underflow flag remains LOW. 


Invalid Operations: If an input operand is invalid for the 
operation to be performed, that operation is considered 
invalid. There is only one invalid operation in DEC mode: 
performing a floating-point-to-integer conversion on a value 
too large to be converted to an integer. In this case, the final 
result will be DEC-reserved operand 8000000016, and the 
invalid operation and NAN flags will be HIGH. 


Sign Bit 
For all operations producing a DEC floating-point result, the 


sign bit of the final result is unambiguous; i.e., there is only one 
sign bit value that yields a numerically correct result. © 


Rounding 


There are four rounding modes for DEC operation: 1) round to 
nearest, 2) round toward +°, 3) round toward -°, and 4) 
round toward 0. The round toward + ©, round toward —°, and 
round toward 0 modes are performed in a manner identical to 
that for IEEE operation; refer to the Rounding section under 
Operation in IEEE Mode. The round to nearest mode is 
similar.to that for IEEE operation, but differs in one respect: for 
the case in which the infinitely precise result of an operation is 
exactly halfway between two representable values, DEC round 
to nearest mode rounds to the value with the larger magni- 
tude, rather than to the value whose LSB is 0. 


Flag Operation 


The Am29C325 generates six status flags to monitor floating- 
point processor operation. The following is a summary of flag 
operation in DEC mode: 


Invalid Operation Flag: The invalid operation flag is HIGH if 
the FP-TO-INT operation is performed on a floating-point 
number too large to be converted to an integer. The final result 


for such an operation will be the DEC-reserved operand 


8000000046. 


Overflow Flag: The overflow flag is HIGH if an R PLUS S, R 
MINUS S, R TIMES S, or 2 MINUS S operation produces a 
result which, after rounding, has a magnitude greater than or 
equal to 212 . The final result will be the DEC-reserved 
operand 8000000016. 


Underflow Flag: The underflow flag is HIGH if an R PLUS S, 
R MINUS §S, or R TIMES S operation produces a result which, 
after rounding, has a magnitude in the range: 


0 < magnitude < 27 128. 
The final result will be 0 (0000000016) in such cases. 


Inexact Flag: The inexact flag is HIGH if the final result of an 
R PLUS S, R MINUS S, R TIMES §S, 2 MINUS S, INT-TO-FP, or 
FP-TO-INT operation is not equal to the infinitely precise 
result. Note that if the underflow or overflow flag is HIGH, the 
inexact flag will also be HIGH. 


Zero Flag: The zero flag is HIGH if the final result of an 
operation is 0. For operations producing an integer or a DEC 
floating-point number, the flag accompanies the output 0 
(0000000046). (It should be noted that any operation produc- 
ing a floating-point O in DEC mode will output 0000000046.) 


NAN Flag: The NAN flag is HIGH if an R PLUS S, R MINUS S, 
R TIMES S, 2 MINUS §, or FP-TO-INT operation produces a 
DEC-reserved operand as the final result. 


IEEE-TO-DEC and DEC-TO-IEEE Operations 


The IEEE-TO-DEC and DEC-TO-IEEE operations are used to 
convert floating-point numbers between the IEEE and DEC 
formats. Both operations work in a manner independent of the 
IEEE/DEC mode control. 


IEEE-TO-DEC Conversion 


The operation converts an IEEE floating-point number to DEC 
floating-point format. Most conversions are exact; in no case 
does the round mode have any effect on the final result. There 
are, however, a few exceptional cases: 


a) If the IEEE neaung: point input has a magnitude greater than 
or equal to 2127 it is too large to be represented by a DEC 
floating-point number. The final result will be the DEC- 
reserved operand 800000006; the overflow, inexact, and 
NAN flags will be HIGH. 
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b) If the IEEE floating-point input is a NAN, the final result will 
be the DEC-reserved operand 80000000 6; the invalid and 
NAN flags will be HIGH. 


c) lf the IEEE floating-point input is a denormalized number, 
the final result will be a DEC 0 (000000046); the zero flag 
will be HIGH. 


d) If the IEEE floating-point input is + 0 or —0, the final result 
will be a DEC 0 (000000046); the zero flag will be HIGH. 


DEC-TO-IEEE Conversion 


This operation converts a DEC floating-point number to IEEE 
floating-point format. Most conversions are exact; in no case 
does the round mode have any effect on the final result. There 
are, however, a few exceptional cases: 


a) If the DEC floating-point input is not 0, but has a magnitude 
less than 9-126 it is too small to be expressed as a 
normalized IEEE floating-point number. The final result will 
be an IEEE floating-point 0 having the same sign as the 
input (000000016 for positive inputs and 8000000016 for 
negative inputs); the underflow, inexact, and zero flags will 
be HIGH. 


b) If the DEC floating-point input is a DEC-reserved operand, 
the result will be quiet NAN 7FAQ0004¢; the invalid opera- 
tion and NAN flags will be HIGH. 


c) If the DEC floating-point input is 0, the final result will be 
IEEE floating-point +0 (000000016); the zero flag will be 
HIGH. 
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APPENDIX A 


DIFFERENCES BETWEEN THE IEEE 
PROPOSED STANDARD FOR BINARY 
FLOATING-POINT ARITHMETIC AND THE 
Am29C325'S IEEE MODE 


When operated in IEEE mode, the Am29C325 High-Speed 
Floating-Point Processor complies with the single-precision 
portion of the IEEE Proposed Standard for Binary Floating- 
Point Arithmetic (P754, draft 10.0) in most respects. There are, 
however, several differences: 


Denormalized Numbers 


The Am29C325 does not handle denormalized numbers. A 
denormalized input will be converted to zero of the same sign 
before the specified operation takes place. The operation 
proceeds in exactly the same manner as if the input were +0 
or —O, producing the same numerical result and flags. 


lf the result of an operation, after rounding, has a magnitude 
smaller than 27126 the result is replaced by a zero of the 
same sign. 


Representation of Overflows 


In some rounding modes the proposed IEEE standard requires 
that overflows be represented as the format's most-positive or 
most-negative finite number. In particular: 


- When rounding toward 0, all overflows should produce a 
result of the largest representable finite number with the 
sign of the intermediate result. 


-~When rounding toward —°, all positive overflows should 
produce a result of the largest representable positive finite 
number. 


~ When rounding toward + ©, all negative overflows should 
produce a result of the largest representable negative finite 
number. 


The Am29C325, however, always represents positive over- 
flows as + and negative overflows as -©, regardless of 
rounding mode. 


Projective Mode 


The proposed IEEE standard provides only for an affine mode 
to control the handling of infinities. The Am29C325 provides 





both affine and projective modes; the desired mode can be 


selected by the user. 
Traps 


The proposed IEEE standard stipulates that the user be able 
to request a trap on any exception. The Am29C325 does not 
support trapped operation, and behaves as if traps are 
disabled. 


Resetting of Flags 


The proposed IEEE standard states that once an exception 
flag has been set, it is reset only at the user's request. The 
Am29C325's flags, however, reflect the status of the most 
recent operation. 


Generation of the Underflow Flag 


The proposed IEEE standard suggests several possible crite- 
ria for determining if underflow occurs. These criteria generate 
underflow flags that differ in subtle ways. The underflow 
criteria chosen for the Am29C325 stipulate that underflow 
occurs if: 


a) the rounded result of an operation has a magnitude in the 
range: 


0 < magnitde < q- 126, 


and 
b) the final result is not equal to the infinitely precise result. 


Since the Am29C325 never produces a denormalized number 
as the final result of a calculation, condition (b) is true 
whenever (a) is true. Note then that the operation of the 
Am29C325's underflow flag is somewhat different than that of 
an "IEEE standard" system using the same underflow criteria. 
For example, if an operation should produce an infinitely 
precise result that is exactly 2-127 an "IEEE standard" 
system would produce that value as the final result, expressed 
as a denormalized number. Since that system's final result is 
exact, the underflow flag would remain LOW. The Am29C325, 
on the other hand, would output zero; since its final result is 
not exact, the underflow flag would be HIGH. 


APPENDIX B 


DIFFERENCES BETWEEN DEC VAX AND 
Am29C325 DEC MODE 


Operation in DEC mode complies with most aspects of single- 
precision floating-point operation outlined in the Digital Equip- 
ment Corporation's VAX Architecture Manual. However, there 
are some differences that should be noted: 


Format 
The Am29C325's DEC format is: 


sign —bit 31 
exponent -bits 30-23 
mantissa -~22-0 


The VAX format is: 


sign —bit 15 
exponent -14-7 
mantissa —bits 6-0, bits 31-16 


In both cases, fields are listed from MSB to LSB, with bit 31 
the MSB of the 32-bit word. The Am29C325's DEC format can 
be converted to VAX format by swapping the 16 LSBs and 16 
MSBs of the 32-bit word. 


Flags vs. Exceptions 


In DEC VAX operation, certain unusual conditions arising 
during system operation may incur an exception, or an 
indication to the operating system that special handling is 
needed. 


The VAX recognizes a number of arithmetic exceptions. The 
following exceptions are relevant to the operations supported 
by the Am29C325: 


Integer Overflow Trap: indicates that the last operation 
produced an integer overflow. The LSBs of the correct result 
are stored in the destination operand. 


Floating-Point Overflow Trap/Fault: indicates that the last 
operation produced, after normalization and rounding, a float- 
ing-point number with magnitude greater than or equal to 227, 


A trap replaces the destination operand with the DEC- 


reserved operand 800000006; a fault leaves the destination 


Operand unchanged. 


Floating-Point Underflow Trap/Fault: indicates that the last 
operation produced, after normalization and rounding, a float- 
ing-point number with magnitude less than g- 120 A trap 


replaces the destination operand with zero; a fault leaves the 
destination operand unchanged. 


Reserved Operand Fault: indicates that the last operation 
had a reserved operand as an input. The destination operand 
is unchanged. 


The Am29C325 does not directly support DEC traps and | 


faults. Rather, it indicates unusual conditions by setting one or 
more of the six status flags HIGH. Table D2 describes flag 
operation in DEC mode. 


Integer Overflow 


In cases of integer overflow, the VAX signals the integer 
overflow trap and stores the LSBs of the correct result. The 
Am29C325 sets the invalid operation flag and outputs the 
DEC-reserved operand 80000000 46. 


Floating-Point Underflow/Overflow Operation — 


The VAX Architecture Manual specifies the action to be taken 
on the destination operand when floating-point underflow or 
overflow is encountered. The Am29C325 has no immediate 
control over this destination operand, as it resides somewhere 
off-chip, either in a register or memory location. This isn't so 
much a difference between the VAX specification and 
Am29C325 operation as it is a difference in scope. 


The Am29C325 responds to floating-point underflow by pro- 
ducing a final result of 0 (0000000046); the underflow, inexact, 
and zero flags will be HIGH. It responds to floating-point 
overflow by producing the DEC-reserved operand 800000001, 
as the final result; the overflow, inexact, and NAN flags will be 
HIGH. 


Handling of DEC-Reserved Operands 


If an operation has a DEC-reserved operand as an input, the 
Am29C325 will produce that operand as the final result. If an 
operation has two input arguments and both are DEC- 
reserved operands, the operand on port R becomes the final 
result. For the VAX, operations with a DEC-reserved operand 
input or inputs do not modify the destination operand. As 
mentioned above, control of the destination operand is be- 
yond the scope of the Am29C325's operation. 


Inexact Flag 


The Am29C325 provides an inexact flag to indicate that the 
final result produced by an operation is not equal to the 


infinitely precise result. The VAX does not provide this flag. . 
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APPENDIX C | 
PERFORMING FLOATING-POINT DIVISION 
ON THE Am29C325 | 


While the Am29C325 does not have a floating-point division 
instruction, it can be used to evaluate reciprocals. The 
division: 


C=A/B 


can then be performed by evaluating: 


C = A*(1/B) 


Only a modest amount of external hardware is needed to 
implement the reciprocal function. 


The technique for calculating reciprocals is based on the 
Newton-Raphson method for obtaining the roots of an equa- 
tion. The roots of equation: 


F(x) = 0 


can be found by iteratively evaluating the equation: 


Xi +1 =X) — F(xj)/F'(x) 


The process begins by making a guess as to the value of x, 
and using this guess or "'seed'' value to perform the first 
iteration. Iterations are continued until the root is evaluated to 
the desired accuracy. The number of iterations needed to 
achieve a given accuracy depends both on the accuracy of the 
seed value and the nature of F(x). 


Now consider the equation: 


F(x) = (1/x) - B 


The root of F(x) is 1/B. The reciprocal of B, then, can be found 
by using the Newton-Raphson method to find the root of F(x). 
The iterative equation for finding the root is: 


Xi +1 = Xj— F(xi)/F'(X)) 3 
= xj — (1/xj - B)/ — (Kj) ~ 
= xj (2 - B*xi) 


it can be shown that, in order for this iterative equation to 
converge, the seed value xg must fall in the range: 


0 <x9 < 2/B 
or 2/B <x9 <0 


if B>O 
if B<0O 


For example, if the reciprocal of 3 is to be evaluated, the seed 
value must be between 0 and 2/3. 


The error of xj reduces quadratically; that is, if the error of x; is 
e, the error is reduced to order e by the next iteration. The 
number of bits of accuracy in the result, then, roughly doubles 
after every iteration. While this is only an approximation of the 
actual error produced, it is a handy rule of thumb for 
determining the number of iterations needed to produce a 
result of a certain accuracy, given the accuracy of the seed. 


Example 1: 
Find the reciprocal of 7.25. 
Solution: 
The seed value must fall in the range: 


0 < xg < 2/7.25 
or 0< Xo < .275862 


Suppose Xo is chosen to be .1: 


Iteration 1: x1 = Xo (2-B*xo) 
.1(2 ~ (7.25) (.1)) 
1275 


xq (2 -B*x1) 
.1275(2 - (7.25) (.1275)) 
1371421875 


Iteration 3: xg = xo (2-B*xo) 
= .1371421875* 
(2-(7.25) (.1371421875)) 
= .1379265230 


The actual value of 1/7.25, to ten decimal places, is 
.1379310345. 


The error after each iteration is: 


iteration |» ] Error to Ten Places 
C0 jon | -aaresraaas 


Example 2: 


Iteration 2: xo 













Find the reciprocal of —0.3. 
Solution: 
The seed value must fall in the range: 


2/(-0.3) < xg < 0 
or -6.66 < xg < 0 
Suppose xo is chosen to be -2.0: 
Iteration 1: x1 = xq (2-B*xo) 
— 2.0(2 -(-0.3) (-2.0)) 
—2.8 
Iteration 2: xo = xz (2-B*x) 
—2.8(2 - (-0.3) (-2.8)) 
~—3.248 
Iteration 3: xg = x2 (2-B*x9) 
= ~3.248(2-(-0.3) (-3.248)) 
= —3.3311488 
Iteration 4: x4 = xg (2 —- B*x3) 
= —3.3311488* 


(2-(-0.3) (-3.3311488)) 
= —3.333331902 


The actual value of 1/(-0.3), to ten decimal places, is 
—3.333333333. 


The error after each iteration is: 


a ror to Ten Paces | 
ee 
ee = 


In order to implement the Newton-Raphson method on the 
Am29C325, some means is needed to generate the seed used 
in the first iteration. One approach is to place a hardware seed 
look-up table between the R bus and the Am29C325; see 
Table C1. A more detailed diagram of the look-up table 
appears in Figure C2. 
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TABLE C1. CONTENTS OF THE SEED EXPONENT PROM 


Address (16) Data (16) Address (16) Data (16) 


(Note 1) (Note 1) 
(Note 1) FC 
FF FB 
FE FA 
FD FQ 
FC F8 
FB F7 
FA F6 
FQ 
F8 
F7 
F6 
F5 
F4 
F3 
F2 
F1 
FO 
EF 


(Note 2) 
(Note 2) 
(Note 2) 





Notes: 1. The reciprocals of these numbers are too large to be represented in the 
selected format. 
2. The reciprocals of these numbers are too small to be represented in 
normalized IEEE format. 
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Figure C1. Adding a Hardware Look-Up Table to the Am29C325 


The look-up table has two sections: a biased exponent look-up 
PROM, and a fraction look-up PROM. The seed-biased 


exponent look-up table is stored in a 512-by-8-bit PROM. This ~ 


table consists of two sections: the DEC format section (which 
occupies addresses OOO-OFF16), and the IEEE section 
(which occupies addresses 100-1FFig. The appropriate 
table will be selected automatically if address line Ag is wired 
to the Am29C325's IEEE/DEC pin. The equations imple- 
mented by these table sections are: 


DEC table: seed biased exponent 
= 25719 —input biased exponent 


IEEE table: seed biased exponent 
= 25319 ~—input biased exponent 


Table C1 lists the contents of this PROM. 


The seed fraction look-up table is stored in one or more 
PROMs, the number of PROMs depending on the desired 
accuracy of the seed value. The hardware depicted in Figure 





C2 uses two 4K-by-8-bit PROMs to implement a fraction look- 
up table whose inputs are the 12 MSBs of the input argu- 
ment's fraction. These PROMs output the 16 MSBs of the 
seed's fraction field — the remaining 7 bits of fraction are set 
to 0. The equation implemented in this table is: 

2 


ne ay - | 
1 + input fraction 
where the value of the input fraction falls in the range 


seed fraction = 


0 Sinput fraction < 1 


Note that the seed fraction must also be constrained to fall in 
the range 


0 < seed fraction < 1 


Therefore, if the input fraction is 0, the corresponding seed | 
fraction stored in the table must be .111...1112, not 1.02. The 
same seed fraction look-up table may be used for both IEEE 
and DEC formats. Table C2 contains a partial listing for the 
seed fraction look-up table shown in Figure C2. 


TABLE C2. CONTENTS OF THE SEED FRACTION PROMS 


Address (16) Value of ae Fraction (10) Value of Seed Fraction (10) 


\EEE/DEC 


ane 406 
0.0004882812 
0.0007324219 
0.0009765625 
0.0012207031 
0.0014648438 
0.0017089844 
0.0019531250 
0.0021972656 
0.002441 4063 
0.0026855469 
0.0029296875 


0.9975585938 
0.9978027344 
0.9980486750 
0.9982910156 
0.9985351563 
0.9987792969 
0.9990234375 
0.9992675781 
0.9995117188 
0.9997558594 


SEED SIGN SEED EXPONENT 


0.9999999999 
0.9995118370 
0.9990239150 
0.9985362280 
0.9980487790 
0.9975615710 
0.9970745970 
0.9965878630 
0.9961013650 
0.9956151030 
0.9951290800 
0.9946432920 
0.9941577400 


0.0012221950 
0.0010998410 
0.0009775170 
0.0008552230 
0.0007329590 
0.0006107240 
0.0004885200 
0.0003663450 
0.0002442000 
0.0001220850 


BIASED 
EXPONENT 


(R39- R23) 


Ag A7-Ao 
Am27S15 512 x8 


SEED EXPONENT PROM 


D7~Do 


PROM Outputs (16) 
R22- R415 R14-R7 


(see text) 
FF 
FF 
FF 
FF 
FF 


a 
12 
12 MSBs 
OF FRACTION 
(Ro2—Ry4) 


(2) Am27S43 4K x 8 
SEED FRACTION PROMs 


SEED FRACTION 
AF004631 


Figure C2. The Hardware Look-Up Table 


With the hardware look-up table in place, the reciprocal of 
value B can be calculated with the following series of 
operations: 


1) Place B on both the R and S buses. The 2: 1 multiplexer at 
the output of the hardware look-up table should select the 
output of the look-up table (see Figure C3-A). 


2) Load the seed value xp into register R and load B into 
register S. Select the R TIMES S operation (see Figure 
C3-B). 


3) Load product B*xo into register F. Select the 2 MINUS S 
operation, and select register F as the input to the ALU S 
port (see Figure C3-C). 


4) Load 2-B*xg into register F. Select the R TIMES S 
operation and select register F as the input to the ALU S 
port (see Figure C3-D). | 


5) Load the value x1 (x1 = x9(2 ~ B"xo)) into registers R and F. 
Select the R TIMES S operation (see Figure C3-E). 


6) Repeat steps 3 through 5 until the result has the accuracy 
desired. 
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Figure C3-A. Data Flow for Step 1 of the Reciprocal Procedure 
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Figure C3-B. Data Flow for Step 2 of the Reciprocal Procedure 
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Figure C3-C. Data Flow for Step 3 of the Reciprocal Procedure 
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Figure C3-D. Data Flow for Step 4 of the Reciprocal Procedure 
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Figure C3-E. Data Flow for Step 5 of the Reciprocal Procedure 


4-118 


DF006251 









A tabular description of the operations above is given in Table and port S. The look-up table produces the value 







C3. The following examples, performed in IEEE format, .0395278919 (3D21E8001¢6). The reciprocal is 
illustrate the process. evaluated using the procedure described above; 
Example 1: register values for each step are given in Table C4. 





The expected result, to the precision of the float- 
Find the reciprocal of 25.3. _ ing-point word, is .0395256919 (3D21E5B146). In 
this case the expected result is produced after the 
first iteration. All subsequent iterations produce the 
same result, and are therefore unnecessary. 






Solution: The IEEE floating-point representation for 25.3 is 
41CA66661g. The reciprocal process is begun by 
feeding this value to both the seed look-up table 












TABLE C3. SEQUENCE OF EVENTS FOR EVALUATING RECIPROCALS 


Clock 
Cycle Register R Register S aie een F 


fo ee Oe ee 
Sa ALT ED COA A See Sen 
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First 
iteration 






Te [emwsst+{x[+]rfo] | 
C7 fatmess[s[sfo[+[o] 
Pe fatness Pope ta 1 po Peeme-exp] 8 parma 


X = DON'T CARE 





Second 
iteration 












TABLE C4. INPUT BUS AND REGISTER VALUES FOR EXAMPLE 1 


Clock 
Cycle Register R Register S Register F 
1 3D21E800 41CA666616 
(.03952789) (25.3) 
2 3D21E800ig | 41CAG6661¢ 
(.03952789) (25.3) 
3D21E8001¢ 41CA6666;g6 | 3F8001D3i¢ 
(.03952789) (25.3) (1.0000556) 
3D21E8001¢ 41CA66661g | 3F7FFC5A16 
(.03952789) (25.3) (.99984419) 
3D21E5B1ig | 41CA6666;¢6 | 3D21E5B14¢ 
(.03952569) (25.3) (.03952569) 
3D21E5B14g | 41CA666646 3F7FFFFF 46 
(.03952569) (25.3) (.99999994) 
3D21E5B14g | 41CA666616 | 3F800000;¢ 
(.03952569) (25.3) (1.0) 
3D21E5B11g | 41CA666646 3D21E5B146 
(.03952569) (25.3) (.03952569) 

































~«@-— Result of first 
iteration 














~«g-— Result of second 
iteration 
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Sova 7 SE es ee a se 








"eS, 





Example 2: - . evaluated using the procedure described above; 
see cae register values for each step are given in Table C5. 
pingise ecRoce a a The expected result, to the precision of the float- 


Solution: The IEEE floating- -point representation for -0.4725 ing-point word, is -2.11640249 (C007732246). In 


is BEF1EB854¢. The reciprocal process is begun this case the expected result is produced after the 
by feeding this value to both the seed look-up table first iteration. All subsequent iterations produce the 
and port S. The look-up table produces the value same result, and are therefore unnecessary. 
-2.1162109449 (C007700046). The reciprocal is 





TABLE C5. INPUT BUS AND REGISTER VALUES FOR EXAMPLE 2 


Clock jo | 
Cycle R Input Register R Register S Register F 
C007700046 | BEFIEB85456 
| (-2.1162109) | (-0.4725) 
C0077000;g | BEF1EB8516¢ 
(-2.1162109) | (-0.4725) 
C0077000;g | BEFIEB85;¢ | 3F7FFA14i¢ 
(-2.1162109) | (-0.4725) (0.99990963) 
4 C007700016 BEFIEB851g— | 3F8002F61¢6 
3 (-2.1162109) | (-0.4725) (1.0000904) 
| C00773221g | BEF1EB851g | C007732216 
(-2.116402) | (-0.4725) (- 2.116402) 
| C00773224¢6 | BEFIEB85;¢ | 3F800000j¢ 
(-2.116402) | (-0.4725) (1.0) 


srr BEF1EB85ig | 3F8000001¢ 
dee (- 0.4725) (1.0 


C00773221g | BEFIEB851¢ | C00773221¢ . 
(-2.116402) | (-0.4725) —‘| (-2.116402) 









x 
| - 


<g— Result of first 
iteration 









=<g@- Result of second 
iteration 


APPENDIX D 
SUMMARY OF FLAG OPERATION 


Tables D1, D2, and D3 summarize flag operation for the IEEE 
mode, the DEC mode, and for the IEEE-TO-DEC and DEC-TO- 
IEEE operations. 


TABLE D1. FLAG SUMMARY FOR IEEE MODE 


| Operation | Condition’s) | NV | ove | UNF | INE | 
L L 


ee na aa 


listed in the 
H 












IEEE Invalid 
Operations Table 


R PLUS S$ 
R MINUS S 
R TIMES S 
2 MINUS S 


R PLUS S$ 
R MINUS S 
R TIMES S 


R PLUS S$ 
R MINUS S 
R TIMES $S 
2 MINUS S 
INT-TO-FP 
FP-TO-INT 


R PLUS S$ 
R MINUS S 
R TIMES S 
2 MINUS S 
INT-TO-FP 
FP-TO-INT 


R PLUS S 
R MINUS S 
R TIMES S 
2 MINUS S 
FP-TO-INT 


Notes: INV = Invalid operation flag 
OVF = Overflow flag 
UNF = Underflow flag 
INE = Inexact flag 
ZER = Zero flag 
NAN = NAN flag 
L = LOW 
H = HIGH 
*= State of flag 
depends on the 
input operands 
and the operation 
performed 
















r- 


Input operands are finite 
| rounded result | > 2128 










0 <|rounded result |< glee 

























Final result does not equal 
infinitely precise result 








Final result is zero 








Final result is a NAN 
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TABLE D2. FLAG SUMMARY FOR DEC MODE 


FP-TO-INT Rounded result > 22'~1 | OH L L L L H 
or rounded result < -2°3! 

FP-TO-INT — Input is a DEC-reserved L L L L L H 
operand 


R PLUS S . 
R MINUS S | Rounded result | > 2127 

R TIMES S 

2 MINUS S 


R PLUS $ 
R MINUS S | 0 <|rounded result] < 126 
R TIMES S 


R PLUS S Final result does not equal 
R MINUS S infinitely precise result 

R TIMES S 

2 MIMUS S 

INT-TO-FP 

FP-TO-INT 


R PLUS S$ Final result is zero 
R MINUS S 

R TIMES S 

2 MINUS S 

INT-TO-FP 

FP-TO-INT 


R PLUS S$ Final result is a DEC-reserved 
R MINUS S operand 

R TIMES S 

2 MINUS S$ 

FP-TO-INT 


Notes: INV = Invalid operation flag 
OVF = Overflow flag * = State of flag 
UNF = Underflow flag depends on the 
INE = Inexact flag input operands 
ZER = Zero flag and the operation 
NAN = NAN flag performed 
L=LOW 


TABLE D3. FLAG SUMMARY FOR IEEE-TO-DEC AND DEC-TO-IEEE CONVERSIONS 


[operation | Gonaitoney | | ovr | wr | We 
TeceTODEC | mew eanaN id A) tt 


1 DEC-TO-IEEE Final result is zero 
IEEE-TO-DEC 








Notes: INV = Invalid operation flag H = HIGH 
OVF = Overflow flag * = State of flag 
UNF = Underflow flag depends on the 
INE = Inexact flag input operands 
ZER = Zero flag and the operation 
NAN = NAN flag performed 
L = LOW 
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ABSOLUTE MAXIMUM RATINGS OPERATING RANGES 












Storage Temperature .............. aioaetneeeted -65 to + 150°C Commercial (C) Devices 

Case Temperature Under Bias................ ~55 to +125°C Temperature, Case (TA) .........:cceeeceeeeeeees 0 to +70°C 

Supply Voltage to Ground Potential Supply Voltage (VCC) ........cceeee cece es +475 to +5.25 V 
CONniINGOUSs.45 Awwitsccs ae tatnandiaatinreeae -0.3 to +7.0 V tat ue 

DC Voltage Applied to Outputs cs Ue Sauces ‘ 

emperature (TA)..........ccccceneeeeeeeeeees ~55 to +125°C 

for HIGH Output State........... ~0.3 V to +Vcoco + 0.3 V Supply Voltage (Voc) 445 V to +55 V 

DC Input Voltage.......... ccc cece eee eee H0:3 10: VeGrO9 NV 8 OO oe eee a 

DC Output Current, into LOW Outputs ................. 30 mA Operating ranges define tho its between which the 

DC Input Current .................cc cece cence eee -—10 to +10 mA functionality of the device is* 

Stresses above those listed under ABSOLUTE MAXIMUM RAG: éh-ad a ° ° 

RATINGS may cause permanent device failure. Functionality eee ea eka pee Ne eee 


at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 


DC CHARACTERISTICS over operating range unless otherwise spesified: (for APL Products, Group A, 
Subgroups 1, 2, 3 are tested unless otherwise noted) 


Parameter Parameter 
Symbol | Description 
Output HIGH Voltage 













Y-BUS, 4 mA for 
All Other Pins 


Vit Guaranteed Input Logical 
LOW Voltage (Note 2) 
we 
NH 
Vin = Vcc -0.5 V 
IOZH Voc = Max., Vo = 2.4 V 
lOZL Voc = Max., Vo = 0.5 V 


Static Power Supply Current Voc = Max., Vin = Vcc or GND, lo =O vA 
Power Dissipation Capacitance | Vcc = 5.0 V, Ta = 25°C, No Load 
(Note 3) 


Notes: 1. Vcc conditions shown as Min. or Max. refer to the commercial and military Vcc limits. 
2. These input levels provide zero-noise immunity and should only be statically tested in a noise-free environment (not functionally tested). 
3. Cpp determines the no-load dynamic current consumption: 
lcc (Total) = Ico (Static) + Cpp Vcc f, where f is the switching frequency of the majority of the internal nodes, normally one-half of 
the clock frequency. 






(COM and MIL 


) 





pF Typical 
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SWITCHING CHARACTERISTICS over COMMERCIAL operating range 


| 29ca25 29C325-1 - 29C325-2 














Conditions 






































FT = HIGH 
FT, =HIGH 











ui i 

oe - a 
Zee 
FT) = LOW 
FT, = HIGH 





ONEBUS = L@ 













Parameter Parameter 
Symbol Description 
1 tasc Clocked Add, Subtract Time (R 
PLUS ‘S, R MINUS S, 2 MINUS S) 
ee Clocked Multiply Time (R TIMES S) 
poe i Clocked Conversion. Time (INT-TO- 
FP, FP-TO-INT, IEEE-TO-DEC, DEC- 
BEE 
ai Unclocked Add, Subtract Time (R, S 
to F, Flags) for R PLUS S, R 
MINUS S,and 2 MINUS S 
Instructions 
tmuC Unclocked Multiply Time (R, S to F, 
Flags) for R TIMES S Instruction 
tcuc Unclocked Conversion Time (R, S to 
F, Flags) for INT-TO-FP, FP-TO- 
INT, IEEE- TO-DEC and DEC-TO- 
IEEE Instructions 
Clock Pulse Width HIGH 
| 8 |[tpw. _| Clock Pulse Width LOW 
tppOF1 Clock to Fo-F31 and Flag Outputs 
OE Enable Time Z to LOW 
12 Z to HIGH; 
| OE Disable Time LOW fee Z 
Clock t to Fo—Fi5 ele z to LOW 
Enable, 16-Bit 1/O 
Clock | to Fo-F45 LOW to Z 
Disable, 16-Bit I/O 
Clock | to Fig-F3; |Z to LOW 
Enable, 16-Bit I/O Q 
Pare Clock t to Fig-F31 | LOW to Z 
Disable,16-Bit 1/O 
tPHZ16 Mode 


23 tsce 


24 tHCE 


25 | tsot 


26 
27 


tHD1 


tsp2 FT9 = HIGH 


FT, = LOW 


FT for 136 118 118 
owt Te te] fe 


FT 
mee 
a eae 


28 
29 


tHD2 
tsio2 








tHio2 


31 | tppio2 


Bs. — = 
1 p 
1 I I I i 


32 | tsi3 
[ee ee ae ns 


l4 Register R Input Select Setup 


FTg = LOW 15 15 15 ns 
ae ee (Note 1) 
tHi4 l4 Register R Input Select Hold 
Time 
(Note 1) 


tsRM Round Mode Select Setup Time FT for 46 hee cel 46 
ie el cee |p 


| 38 |tprr | Round Mode Select to Fo-Fa1, Flags |FT17=HIGH | | 64 [| | 


Notes: 1. See timing diagram for desired mode of operation to determine clock edge to which these setup and hold times apply. 





dl 
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SWITCHING CHARACTERISTICS over MILITARY operating range (for APL Products, Group A, Subgroups’ 


9, 10, 11 are tested unless otherwise noted) 
29C325 
Parameter Parameter Test 
Symbol Description Conditions | Min, | Max. 
1 tasc Clocked Add, Subtract Time (R PLUS S, 145 ns 
R MINUS S, 2 MINUS S) 
3 tcc Clocked Conversion Time (INT-TO-FP, 
FP-TO-INT, |EEE-TO-DEC, DEC-TO-IEEE) 
4 taSUC Unclocked Add, Subtract Time (R, S to F, 
Flags) for R PLUS S, R MINUS S, 
and 2 MINUS S Instructions 
tmuUC .| Unclocked Multiply Time (R, S to F, Flags) 
for R TIMES S Instruction 
ee ae Unclocked Conversion Time (R, S to F, 


Flags) for INT-TO-FP, FP-TO-INT, IEEE- 

TO-DEC and DEC-TO-IEEE Instructions 
8 [tome [Glock Pulse Wie LOW 
tpDOF1 ' | Clock to Fg-F31 and Flag Outputs 


13 OE Disable Time 

r 

15 Clock t to Fo-F45 Enable, 16- 
Bit |/O Mode 

16 


17 Clock | to Fo-F15 Disable, 


16-Bit 1/O Mode 
18 
19 Clock | to F1g-F 31 Enable, 


16-Bit !/O Mode 
20 tPZH16 


21 tPLZ16 
22 tPHZ16 
23 tSCE 


tHCE 
tHD1 


28 tHD2 
29 tsio2 































FT 9 = HIGH 
FT; = HIGH 


















































































S16/32 = HIGH 
ONEBUS = LOW 










Clock f to Fig —-F31 
Disable,16-Bit 1/O Mode 












FT) = LOW 


15 ns 
FT, = LOW 
FTg = LOW ns 
FT, =LOW 


ee eon wal 

| aa 

= oe ee 
Seay SSE 
ee pe 

aa 


24 
25 
26 
27 


FT for Destination 


Register = LOW 
tHio2 


“S Input Select Setup Time FT, = LOW | 152s os 
a DE OR geen eee 


Notes: 1. See timing diagram for desired mode of operation to determind clock edge to which these setup and hold times apply. 


30 
31 
32 
33 
34 
35 
36 
37 
38 
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SWITCHING TEST CIRCUITS 


S; 


Vout oor O 





f 


TC001104 


5.0 —- Vee - VoL 
VOL 
Ry =lo, +-—— 
17 10L 1K 


A. Three-State Outputs 


S; 


Vout oo" 





TC001084 
24V 
Ro = —— 
OH 
5.0 - Vee - VoL 
ay , VOL 
+—-—— 
OL Ro 


B. Normal Outputs 


Notes: 1. Cy, = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S41, Se, S3 are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpz}, test. 


S$; and Soe are closed while S3 is open for tpz , test. 


4. C, = 5.0 pF for output disable tests. 
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SWITCHING TEST WAVEFORMS 











TIMING 
INPUT 15 V 
OV 
WFR02970 
Notes: 1. Diagram shown for HIGH data only. 
Output transition may be opposite sense. 
2. Cross hatched area is don't care 
condition. 
Set-Up, Hold, and Release Times 
3V 
SAME PHASE ___ ine 


INPUT TRANSITION 


OUTPUT 


OPPOSITE PHASE ___ 
INPUT TRANSITION 


WFR02980 


Propagation Delay 






LOW HIGH-LOW 
PULSE 








HIGH-LOW HIGH 


PULSE 15 V 
WFRO02790 
Pulse Width 
Enable Disable 
3 V 
CONTROL 
INPUT =. 2 ae ay 
—————_—— ov 
ZL 4 
~45 V 
OUTPUT 05 V 
NORMALLY - 1.5 V ~15 V 





LOW 
S3 OPEN 


pes ) 
OUTPUT 
NORMALLY 15 V ~15 Vv 
HIGH 
$9 OPEN 05 V 
~O0 Vv 


WFRO2660 
Notes: 1. Diagram shown for Input Control Enable- 
LOW and Input Control Disable-HIGH. 
2. S;, So and Sg of Load Circuit are closed 
except where shown. 


< 
Q 
=x 





Enable and Disable Times 
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SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


WAVEFORM INPUTS OUTPUTS 


MUST BE WILL BE 
STEADY STEADY 


WILL BE 
CHANGING 
FROM H TOL 


\ MAY CHANGE 
FROM 4H TOL 


WILL BE 
CHANGING 
FROML TOH 


MAY CHANGE 
FROM L TOH 


DON’T CARE; CHANGING; 
ANY CHANGE STATE 
PERMITTED UNKNOWN 


CENTER 
DOES NOT LINE {S HIGH 
APPLY (MPEDANCE 

“OFF” STATE 





KS000010 











1 
Bm 
abe 1 


- @) 
am, TOO AEH KEK RRR LORRI 
= seciata 1: fetmianenaadionn in BORN 


we Xe) 
@) a 
Ro-Ra1, SXXXXX) COROT ONC OOOO OHNO 00, OK 
ED ye EX _| 
@ 


RRR 


S,- 
0 $31 KOK v Ws 


oe : 
! ad ee @ 


RNDy-RND, 
WF023760 


Clocked Operation: FTp = LOW 
FT, = LOW 
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SWITCHING WAVEFORMS (Cont'd.) 


@ 


@ @) + 
\/ XXX KKK (X (\( \/ YKKKKKKKKKXKKXX OXY YY 
ccna RRRARA RN XXXR 
Co) 


x XXX 
BOK | OK 


_— XX) kK 
ENF YY taal 


RNOg~ RND, 


WF023770 


Clocked Operation: FT9 = HIGH 
FT, =LOW 


@ @ 
BECCCOES iF KK KRKKRKKK KKK KKK KKK YS ) / 
RE AAAROK —_|__ORRSRRRARRNMRRARK—fK 


Y 


\) 


Fara ORR VAN IAAU NUN MONONA 
tae URSOK ase | ORR vauo | 
@) 6 
TVA, p x - | : en 
DC CnC Oa 


4 
» YY Pan Ota 


RN 


YY 
QO OU YY 


pete 
_———— 


WF023780 


Clocked Operation: FTg = LOW 
: FT, = HIGH 
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SWITCHING WAVEFORMS (Cont'd.) 


, 
eA 
| © 

15, nn 
QO Vy | 
a, eS 

\/ XX KKK) ; 
KRY 
| ee 


v) 


=r RK 
ANDO FINPY OY 


UAUWUU MOU ULUCREUOALOLOLOLOLOLU OU OHO U0 
OR RRR RXR AMARA 


0.0 
XY 
WIN oy 


XXX 


WF023790 





Flow-Through Operation (FTg = HIGH, FT, = HIGH) 













INPUT DATA <XXXXXXX) 
Bus QXXYXYXXY) 


SS, 


WF023800 
32-Bit, Single-Input Bus Mode 
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SWITCHING WAVEFORMS (Cont'd.) 


OY 


\/ XX) 
YY 














R INPUT BUS, 
S INPUT BUS 





























G4) G5) 
' YKKKKKKKKKK KY IXXXKX XXX ) 
Nore 1) YO XX XRAY) 

















Fo-Fis 


Fig —F3y 





WF023810 


Note 1. 14 has special setup and hold time requirements in this mode. All other control signals have timing 
requirements as shown in the diagram "'Clocked operation, FT9 = LOW, FT, = LOW." 


16-Bit, Two-Ilnput Bus Mode 
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INPUT/OUTPUT CIRCUIT DIAGRAMS 


DRIVEN INPUT OUTPUT 





ICOO0860 1C000870 
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Am29C327 


CMOS Double-Precision Floating-Point Processor 


ADVANCE INFORMATION 


DISTINCTIVE CHARACTERISTICS 


@ High-performance double-precision floating-point pro- e 
cessor 

® Comprehensive floating-point and integer instruction e 
sets 


@ Single VLSI device performs single-, double-, and 
mixed-precision operations 

@ Performs conversions between precisions and between ® 
data formats e 
@ Compatible with industry-standard floating-point formats 

- IEEE 754 format 

- DEC F, DEC D, and DEC G formats 

- IBM system/370 format 


Exact IEEE compliance for denormalized numbers with 
no speed penalty 

Eight-deep register file for intermediate results and on- 
chip 64-bit data path facilitates compound operations; 
e.g., Newton-Raphson division, sum-of-products, and 
transcendentals 

Supports pipelined or flow-through operation 
Fabricated with Advanced Micro Devices’ 1.2 micron 
CMOS process 


SIMPLIFIED SYSTEM DIAGRAM 


R-Port 
32 


S-Port 


32 


Operand Router 


Constants 





ALU Input Multiplexer 


Floating-Point & Integer 
ALU 


64 


64 


Output Multiplexer 


DEC F, DEC D, DEC G, and VAX are trademarks of the Digital Equipment Corporation. 
IBM system/370 is a trademark of International Business Machines, Inc. 


BD007470 





Publication # Rev. Amendment 
09418 B /0 
Issue Date: November 1987 
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GENERAL DESCRIPTION 


The Am29C327 double-precision floating-point processor is a 
single VLSI device that implements an extensive floating-point 
and integer instruction set, and can perform single-, double- or 
mixed-precision operations. The three most popular floating- 
point formats - IEEE, DEC, and IBM-—are supported. IEEE 
operations comply with Standard 754, with direct implementa- 
tion of special features such as gradual underflow and trap 
handling. 


The Am29C327 consists of a 64-bit ALU, a 64-bit datapath, 
and a control unit. The ALU has three data input ports, and 
can perform compound operations of the form (A * B)+C. 
The data path comprises two 64-bit input operand registers, 
an 8-by-64-bit register file for storage of intermediate resuits, 
three operand-selection multiplexers that provide for ortnhogo- 





nal selection of input operands, a 64-bit output register, and an 


output multiplexer that allows access to the 32 MSBs or 32 
LSBs of the result data. Control signals determine the opera- 
tion to be performed, the source of operands, operand 
precision, rounding mode, and other aspects of device opera- 
tion. 


Operations can be performed in either of two modes: flow- 
through or pipelined. In the flow-through mode, the ALU is 
completely combinatorial; this mode is best suited for scalar 
operations. Pipelined mode divides the ALU into one or two 
pipelined stages, for use in vector operations, as often found 
in graphics or signal processing. 


Fabricated with AMD's 1.2 micron technology, the Am29C327 
is housed in a 169-lead pin-grid-array (PGA) package. 


This document contains information on a product under development at Advanced Micro Devices, Inc. The information is intended to 
help you to evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice. 
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RELATED AMD PRODUCTS 


[Part wo [Description = 
Am29C10A | CMOS Microprogram Controller 


Am29C116 CMOS Minimum Power 16-Bit 
Microprocessor 

Am29C117 CMOS Two-Port 16-Bit 
Microprocessor 

Am29PL141| Field-Programmable Controller (FPC) 

Am29C323 | CMOS 32-Bit Parallel Multiplier 


Am29C325 CMOS 32-Bit Floating-Point 
Processor 

Am29C331 CMOS 16-Bit Microprogram 
Sequencer 

Am29C332 | CMOS 32-Bit Arithmetic Logic Unit 


Am29C334 | CMOS Four-Port Dual-Access 
Register File 


CONNECTION DIAGRAM 
169-Lead PGA* 
Bottom View 


c 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 
© 


*Pinout observed from pin side of package. 
**Alignment pin (not connected internally). 
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LOGIC SYMBOL 


RFSELg -RFSEL2 
PSELg -PSEL3 


QSELy -QSEL3 


| TSELg-TSEL 3 
FSEL 
loys 

; RM -RM 9 
LAVE 


LS003081 


ORDERING INFORMATION 
Standard Products 


AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is formed by 
a combination of: a. Device Number 

b. Speed Option (if applicable) 

c. Package Type 

d. Temperature Range 

e. Optional Processing 


AM29C327 G 


lo 
w 


|, OPTIONAL PROCESSING 


Blank = Standard processing 
B = Burn-in 


d. TEMPERATURE RANGE 
C = Commercial (0 to + 70°C) 


c. PACKAGE TYPE 
G = 169-Lead Pin Grid Array without Heatsink 
(CGX169) 


b. SPEED OPTION 
Not Applicable 


a. DEVICE NUMBER/DESCRIPTION 
Am29C327 
Double-Precision Floating-Point Processor 


Valid Combinations 
AM29C327 GC, GCB 










Valid Combinations 


Valid Combinations fist configurations planned to be 
supported in volume for this device. Consult the local AMD 
sales office to confirm availability of specific valid 
combinations, to check on newly released combinations, and 
to obtain additional data on AMD's standard military grade 
products. 
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PIN DESCRIPTION 


CLK Clock (Input) 
Clock input to all registers. 


ENF F Register Enable (Input: Active LOW) 

When ENF is HIGH, the contents of the F register are static. 
When ENF is LOW, the ALU output data is clocked into the 
F register on the next LOW-to-HIGH transition of CLK. Note 
that the F register can be made transparent by setting the 
mode register bit M17 HIGH (as described in the Mode 
Register Description section); when the F register is 
transparent, ENF has no effect. 


ENI instruction Register Enable (Input; Active LOW) 
When ENI is LOW, an instruction word is clocked into the 
instruction register on the next LOW-to-HIGH transition of 
CLK. The instruction word comprises the following fields: P, 
Q, and T-multiplexer control inputs, rounding modes, ALU 
instruction inputs, and the precision of the output operand. 


ENR R Register Enable (input; Active LOW) 
When ENR is HIGH, the contents of the R register are static. 
When ENR is LOW, new data is loaded into the R register 
on the next LOW-to-HIGH transition of CLK. 


ENRF Register File Enable (Input; Active LOW) 
When ENRF is HIGH, the contents of the register file are 


static. When ENRF is LOW, the ALU output operand is 
clocked into the register file on the next LOW-to- HIGH 
transition of CLK. 


ENS S Register Enable (Input; Active LOW) 
When ENS is HIGH, the contents of the S register are static. 
When ENS is LOW, new data is loaded into the S register on 
the next LOW-to-HIGH transition of CLK. 


Fo-F31 F Output Bus (Output) 


FLAG;-FLAGg Flag Outputs (Output) 
The six flag outputs report the status of the last operation 
executed. 


FSEL Output Multiplexer Control (Input) 
When FSEL is HIGH, the most significant 32 bits of the 
output register are connected to the output driver. When 
FSEL is LOW, the least significant 32 bits of the output 
register are connected to the output driver. 


lo-'43 ALU Instruction Inputs (Input) . 
lo —143 select the operation to be performed by the ALU. 
MSERR-  Master/Slave Error Flag (Output) 
A HIGH level indicates a master/slave error on the current 
output. 


FUNCTIONAL DESCRIPTION 


Overview 


The Am29C327 is a high-performance, single- sa double- 
precision floating-point processor. 


Architecture 


The Am29C327 comprises a high-speed ALU, a 64-bit data 
path, and control circuitry. 


The core of the Am29C327 is a 64-bit floating-point/integer 
ALU. This ALU takes operands from three 64-bit input ports 
and performs the selected operation, placing the result on a 
64-bit output port. Thirteen ALU flags report operation status 
via the 7-bit Flag port. The ALU is completely combinatorial for 
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OEF F Output Bus Enable (Input; Active LOW) 
When OEF is HIGH, signals Fo-F3; assume a high- 
impedance state. When OEF is LOW (and SLAVE is HIGH), 
the output of the F multiplexer is placed on Fo —- F934. 


OES Flag Output Enable (input) 
When OES is HIGH, outputs SIGN and FLAG, _ wough 
FLAGg assume a high-impedance state. When OES is LOW 
(and SLAVE is HIGH), these signals are enabled. 


PSELo-PSEL3 P-Multiplexer Control Inputs (input) 
PSELg-PSELg select the data input to the ALU P-port. 


QSELp-QSEL3 Q-Multiplexer Control Inputs (Input) 
QSELo - QSEL3 select the data input to the ALU Q-port. 


Ro-R3;1 =F Input Bus (input) 


RFSELg-RFSEL2 Register File Select (Input) 
RFSELo-RFSEL2 select the register file location 
(RFo — RF7) to which the ALU result is to be written. Data is 
written to the register file if ENRF is LOW. 


RMo-RM2 Round Mode Control Inputs (Input) 
The Am29C327 supports six rounding modes. RMog — RMo 
select the rounding mode to be applied to the current 
operation. 


So-S31 S Input Bus (input) 


S/DF  F Output Single/Double Control (Input) 
When S/DF is HIGH, the ALU generates a single-precision 
result. When S/DF is LOW, the ALU generates a double- 
precision result. 


S/DR__iR Input Single/Double Control (Input) 
When S/DR is HIGH, the data loaded into the R-port is 
treated as single precision. When S/DR is LOW, the data 
loaded into the R register is treated as double precision. 


S/DS__S Input Single/Doubie Control (Input) 
When S/DS is HIGH, the data loaded into the S-port is 
treated as single precision. When S/DS is LOW, the data 
loaded into the S register is treated as double precision. 


SIGN Sign Flag (Output) 
If the final result of the last operation was negative, SIGN is 
HIGH. If the final result of the last operation was not 
negative, SIGN is LOW. 


SLAVE Master/Slave Mode Select (input) 
When SLAVE is LOW, SLAVE mode is selected. In this 
mode, all outputs except MSERR are disabled. When 
SLAVE is HIGH, MASTER mode is selected. 


TSELg-TSEL3 T-Multiplexer Control Inputs (Input) 
TSELo - TSEL3 select the data input to the ALU T-port. 





reduced latency; optional pipelining is available to boost 
throughput for array operations. 


The data path consists of the 32-bit input buses R and S; two 
64-bit input operand registers; an 8-by-64-bit register file for 
storage of intermediate results; three operand-selection multi- 
plexers that provide for orthogonal selection of input oper- 
ands; a 64-bit output register; and an output multiplexer that 
permits the selection of 32 MSBs, or 32 LSBs of data. Input 
operands enter the processor through the R and S buses, and 
are then demultiplexed and buffered for subsequent storage in 
registers R and S. The operand selection multiplexers route 
the operands to the ALU. Operation results are stored in 
register F, and leave the device on the 32-bit output bus F. 
The results can also be stored in the register file for use in 
subsequent operations. 
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Instruction Set 


The Am29C327 implements 58 arithmetic and logical instruc- 
tions. Thirty-five instructions operate on floating-point num- 
bers; these instructions fall into the following categories: 
@ Addition/subtraction 

Multiplication 

Multiplication-accumulation 

Comparison 

Selecting the larger or smaller of two numbers 
Rounding to integral value 

Absolute value, negation 

Reciprocal seed generation 

Conversion between any of the supported floating-point 
formats 

Conversion of a floating-point number to an integer format, 
with or without a scale factor 

@ Pass operand 


By concatenating these operations, the user can also perform 
division, square-root extraction, polynomial evaluation, and 
other functions not implemented directly. 


Twenty-two instructions operate on integers, and belong to the 
following general categories: 

@ Addition/subtraction 

Multiplication 

Comparison 

Selecting the smaller or larger of two numbers 
Absolute value, negation, pass operand 


@ Logical operations; e.g., AND, OR, XOR, NOT 

@ Arithmetic, logical, and funnel shifts 

®@ Conversion between single- and double-precision integer 
formats 

® Conversion of an integer number to a floating-point format, 
with or without a scale factor 


One special instruction is provided to move data. 
Mixed-Precision Operations 


All Am29C327 instructions, floating-point or integer, can be 
performed with either single- or double-precision operands. In 
addition, the user can elect to mix precisions within an 
operation. All operations are performed in double-precision 
internally; the user specifies the precisions of the input 
operands and the required precision for the output operand. 
The necessary precision conversions are made in concert with 
the selected operation, with no additional cycle-time over- 
head. 


1/0 Modes 


The Am29C327 supports eight |/O modes that afford flexible 
interface to a variety of 32- and 64-bit systems. 


Fault Detection Features 


The Am29C327 contains special comparison hardware to 
allow the operation of two processors in parallel, with one 
processor (the slave) checking the results produced by the 
other (the master). This feature is of particular importance in 
the design of high-reliability systems. 
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Figure 1. Am29C327 Double-Precision Floating-Point Processor 
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Block Diagram Description 


A block diagram of the Am29C327 is shown in Figure 1. The 
Am29C327 comprises input registers, operand selection multi- 
plexers, instruction register, ALU, output register/register file, 
status register, output selection multiplexer, mode register, 
and the master/slave comparator. 


Input Registers/Input Modes 


Operands enter the processor through the R and S buses, and 
are then demultiplexed and buffered for subsequent storage in 
the 65-bit registers R and S. Input operands may be either 
single-precision (32-bit) or double-precision (64-bit) as speci- 
fied by S/DR and S/DS. Accompanying the input registers are 
two 32-bit temporary registers, R-Temp and S-Temp, that 
allow for the overlapping of operand transfers and ALU 
operations. This arrangement of temporary registers and 
demultiplexers permits data and corresponding precision bit 
S/DR or S/DS to be loaded into the 65-bit R register and 65- 
bit S register via one of the eight input modes: 


. 32-bit-bus, double-cycle, LSWs first 
. 32-bit-bus, double-cycle, MSWs first 
. 32-bit-bus, single-cycle, LSWs first 
. 32-bit-bus, single-cycle, MSWs first 
. 64-bit-bus, double-cycle, F first 

. 64-bit-bus, double-cycle, S first 

. 64-bit-bus, single-cycle, R first 

. 64-bit-bus, single-cycle, S first 


ONOahaN — 


These modes are described in detail in the Input Modes 
Description section. 


Operand Selection Multiplexers 


The operand selection multiplexers route operands to the 
ALU. These multiplexers, as well as selecting operands from 
input registers R and S and register file locations RFO - RF7, 
also have access to a set of constants (0, 0.5, 1, 2, 3, Pi). 
These constants are double-precision preprogrammed num- 
bers for use in ALU operations, and are automatically provided 
in the appropriate floating-point or integer format. 


Instruction Register 


The instruction register stores a 32-bit word specifying the 
current processor operation. Included in the instruction word 
are fields that specify the P, Q, and T multiplexer selects, the 
rounding modes; the core operation to be performed by the 
ALU; sign-change controls for ALU input and result operands; 
and the single/double-precision control for the output oper- 
and. The multiplexer selects and the instruction word are 
described in detail in the Instruction Set section; Rounding 
modes are described in Appendix B. 


ALU 


The ALU is a combinatorial arithmetic/logic unit that performs 
a large repertoire of floating-point and integer operations. The 


ALU has three operand inputs, and performs operations of the 
form (P*Q) + T. Most ALU operations require only one or two 
input operands; for example, addition requires only operands 
P and T, multiplication only operands P and Q, and precision 
conversion only operand P. Many ALU arithmetic operations 
allow for the independent control of operand signs, thus 
greatly increasing the number of arithmetic expressions that 
can be evaluated in a single ALU pass. 


The ALU can be configured in either a flow-through mode, for 
which the ALU is completely combinatorial, or a pipelined 
mode, for which ALU operations incur one or two pipeline 
delays, but which results in a higher throughput than flow- 
through mode. 


A detailed description of ALU operations appears in the 
Instruction Set section. 


Output Register/Register File 


The results of the operations performed by the ALU are stored 
in the 64-bit output register F. Results can also be stored in 
the 8-by-64-bit register file for use in subsequent operations. 
Each register file location contains a 65th bit indicating the 
precision of the operand stored in that location, thus permitting 
the ALU to correctly process the operand in subsequent 
operations. 


Status Register 


The status register is a 7-bit register that stores flags 
pertaining to the most recently performed operation. A de- 
tailed description is provided in the Instruction Set section. 


Output Multiplexer 


The output multiplexer routes operation results to the F bus. 
This multiplexer selects the 32 MSBs of the output register or 
the 32 LSBs. 


Master/Slave Comparator 


Each Am29C327 output signal has associated logic that 
compares that signal with the signal that the processor is 
providing internally to the output driver; any discrepancies are 
indicated by assertion of signal MSERR. 


For a single processor, this output comparison detects short 
circuits in output signals or defective output drivers, but does 
not detect open circuits. It is possible to connect a second 
processor in parallel with the first, with the second processor's 
outputs disabled by assertion of signal SLAVE. The second 
processor detects open-circuit signals, as well as providing a 
check of the outputs of the first. 


Mode Register 


The mode register contains processor parameters that are 
changed infrequently. The 32-bit mode word is loaded into the 
register via the R bus. A detailed description of the mode 
register is provided in the Mode Register Description section. 
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Mode Register Description | 





The "Load Mode Register'’ instruction loads ‘a 32-bit word 
appearing on the R port into the mode register. Data is 
clocked into the register on the LOW-to-HIGH transition of 
CLK. The register is organized as described below: 


MO - M3 — Floating-Point Format Select: 


Cat [Mo [Primary Format 


0 0 IEEE 
0 1 
1 0 
1 1 


DEC F (SINGLE), DEC D (DOUBLE) 
0 07 


DEC F (SINGLE), DEC G (DOUBLE) 
0 
| 
1 












IBM 


Alternate Format | 


[EEE 
DEC F (SINGLE), DEC D (DOUBLE) 
DEC F (SINGLE), DEC G (DOUBLE) 
| IBM 














1 
0 
4 





Primary and Alternate Floating-Point Formats 


All floating-point operations with the appropriate precisions 
are performed in the primary format selected by mode register 
bits MO and M1 except for the two following operations: 
1. “Convert T to Alternate Floating-Point Format'’ in 
which the T operand is in the Primary Floating-Point 
Format selected by mode register bits MO and M1, 
and the result generated is in the Alternate Float- 
ing-Point Format specified by mode register bits M2 
and M3. 


2. "Convert T from Alternate Floating-Point Format'' 
_in which the T operand is in the Alternate Floating- 
Point Format specified by mode register bits M2 
and M3, and the result is in the Primary Floating- 
Point Format specified by mode register bits MO 
and M1. 


Conversion or Scaling from Integer to Floating-Point gener- 
ates a floating-point result in the Primary Floating-Point Format 
selected by mode register bits MO and M1. 


When mode register bits M2 and M3 are not used to specify an 
Alternate Floating-Point Format, they are ''don't cares". 


Floating-point formats are discussed in further detail in Appen- 
dix A. 


M4 — Saturate Enable: If M4 is HIGH, overflowed results are 
replaced by the largest representable value in the selected 
format of the same sign as the overflowed result. If M4 is 
LOW, the result is not changed. If M6 is HIGH and the result 
format is IEEE, saturation is disabled. 


M5 — IEEE Affine/Projective Select: If M5 is HIGH, affine 
mode is selected. If M5 is LOW, projective mode is selected. 
The interpretation of infinities is determined by M5. The only 
differences between the modes occur during the addition and 
subtraction of infinities. 


Affine Mode| —_Projective Mode 








(+0) + (+°) |Output +o | Output Quiet NAN, set 
invalid and reserved 
operand flags 


Output -°° Output Quiet NAN, set 
| invalid and reserved 
operand flags 


| Output Quiet NAN, set 
invalid and reserved 
operand flags 


Output Quiet NAN, set 
invalid and reserved 
operand flags 




















(+o) — (-%) |Output +°% 






Output -—° 






If the current floating-point format is hot IEEE, this bit has no 
effect. 


M6 — IEEE Trap Enable: If M6 is HIGH and the result format 
is IEEE, IEEE trapped operation is enabled; the saturate (M4) 
and sudden underflow (M7) bits are ignored. For an under- 
flowed result, the exponent is replaced by e = e + 192 (SP), or 
e=e+ 1536 (DP), with the significand unchanged. For an 
overflowed result, the exponent is replaced by e =e-192 
(SP), or e = e- 1536 (DP), with the significand unchanged. If 
M6 is LOW and the result format is not IEEE, IEEE trapped 
operation is disabled. 


M7 — IEEE Sudden Underflow Enable: If M7 is HIGH and 
IEEE traps are disabled (M6 LOW), all IEEE denormalized 
results are replaced by a zero of the same sign. If M7 is LOW, 
a valid denormalized number will be produced. This bit has no 
effect for result formats other than IEEE. 


M8 — IBM Significance Mask Enable: If M8 is HIGH, certain 
IBM operations having intermediate results of 0 will produce a 
final result of O with the biased exponent unchanged. If M8 is 
LOW, these operations will produce a final result of true-zero. 
This bit has no effect for result formats other than IBM. 


M9 — IBM Underfiow Mask Enable: If M9 is HIGH, certain 
underflowed IBM operations will produce a normalized result 
with the exponent replaced by e + 128. If M9 is LOW, these 
operations will produce a final result of true-zero. This bit has 
no effect for result formats other than IBM. 


M10: Reserved for future use (must be set to Logic 0) 


M11— Integer Multiplication Signed/Unsigned Select: If 
M11 is HIGH, the input operands are treated as two's- 
complement numbers. If M11 is LOW, the input operands are 
treated as unsigned numbers. This bit has no effect for 
operations other than integer multiplication. 


M12, M13 — Integer Multiplication Format Adjust: Selects 
the output format for integer multiplications. The user may 
select either the MSBs or the LSBs of the result of an integer 
multiplication: 


wid [m2 [Output Format 


0 0 LSBs 
0 | 
1 0 
1 1 


LSBs, format-adjusted 
MSBs 

‘'Format-adjusted" indicates that the product is shifted left 

one place before the MSBs or LSBs are selected. 








MSBs, format adjusted 
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M14—-M16 — Input Mode: Selects the input bus mode: 


[wie [Mis [M14 [Input Mode sd 


32-bit-bus, single-cycle, LSW 
first 
32-bit-bus, single-cycle, MSW 
first 
32-bit-bus, double-cycle, LSW 


first . 

32-bit-bus, double-cycle, MSW 
first 

64-bit-bus, single-cycle, R first 
64-bit-bus, single-cycle, S first 
64-bit-bus, double-cylce, R first 
64-bit-bus, double-cycle, S first 


Additional information on input modes can be found in 
the Input Modes section. 





M17 - F Register Feedthrough Enable: When M17 is HIGH, 
register F is made transparent. When M17 is LOW, the ALU 
output data is clocked into the F register on the next LOW-to- 
HIGH transition of CLK. 


M18 - Status Register Feedthrough Enable: When M18 is 
HIGH, the status register is made transparent. When M18 is 
LOW, the output flags are clocked into the status register on 
the next LOW-to-HIGH transition on CLK. 


M19, M20 - Pipeline Mode Select: 


[20 [wie] Pipeine Mode | 





0 X Flow-through mode 

1 0 Single-pipeline 
mode for all opera- 
tions 

1 1 Double-pipeline 
mode for multiply/ 
accumulate 
Single-pipeline 
mode for other 
operations 


M21 - M31 — Reserved for factory test (must be set to Logic 0) 


Input Modes 


The Am29C327 supports a total of eight input modes for 
loading data into the R and S registers. 


The 32-bit bus modes allow the user to connect each input 
port (Ro —- R31 and Sg-— S31) to separate 32-bit buses. 64-bit 
operands can then be loaded by placing the MSBs and LSBs 
alternately on the appropriate ports. In the 64-bit bus modes, 
the two input ports are configured internally as a single 64-bit 
port. The Am29C327 may then be connected directly to a 64- 
bit bus, and 64-bit operands may be loaded in single opera- 
tion. Either the 32-bit bus modes or the 64-bit bus modes may 
be used regardless of the precision of the operands being 
transferred — the choice of input modes will in practice be 
determined by the system into which the Am29C327 is to be 
integrated. 


Single-cycle input modes allow two 64-bit operands to be 
loaded in a single clock cycle. This necessitates driving the 
input buses at twice the speed of the Am29C327. For systems 
when this is not practical, the double-cycle modes allow the 
loading of one 64-bit operand (or two 32-bit operands) per 
clock cycle. 


Data may be loaded from the input buses to the R register and 
S register using one of the eight input modes: 


. 32-Bit Bus, Single-Cycle, LSWs First 
. 32-Bit Bus, Single-Cycle, MSWs First 
. 32-Bit Bus, Double-Cycle, LSWs First 
. 32-Bit Bus, Double-Cycle, MSWs First 
64-Bit Bus, Single-Cycle, R First 
64-Bit Bus, Single-Cycle, S First 
64-Bit Bus, Double-Cycle, R First 
64-Bit Bus, Double-Cycle, S First 


ONOAROD > 


The choice of the input modes is determined by mode register 
bits M14-M16. 


In order to permit the loading of new operands to be 
overlapped with the execution of a current operation, tempo- 
rary registers are provided within the ''operand router'’ block 
(shown in Figure 1). The operation of these temporary 
registers is transparent to the user. The conditions under 
which they are loaded depends on the input mode selected. 


The eight input modes are described on the following pages. 
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32-Bit Bus, Single-Cycle, LSW First (M16 = 0, M15 =0, 
M14 =0) a 


In this mode, the two halves of the 64-bit R operand are 
placed on the R-input bus in successive half-cycles, with the S_ 








INSTRUCTION 
LINES, S/DR, 


eowwaweaeuwee 


FLAGS, SIGN 





operand similarly placed on the S-input port. After one 
complete cycle, the R and S registers contain the R and S 
operands, respectively. 
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Timing of Operations with Input Mode 1 
(32-Bit Bus, Single-Cycle, LSW First)* 


*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
HIGH-to-LOW clock transition. 


At 1, the least-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the least- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. Both words are loaded on the 
HIGH-to-LOW transition of the clock. 


At 2, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the most-significant half of the R 








register, and the most-significant 32 bits of the S operand are 
loaded from the S-input port into the most-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the least-significant half of the R register, and the 
output of the S-temp register is loaded into the least- 
significant half of the S register. 


If an input operand is single-precision, the 32-bit data is kept 
on the input bus for the full cycle. 
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32-Bit Bus, Single-Cycle, MSW First (M16 = 0, M15 = 0, operand similarly placed on the S-input port. After one 
M14 = 1) complete cycle, the R and S registers contain the R and S 





In this mode, the two halves of the 64-bit R operand are 


placed on the R-input bus in successive haif-cycles, with the S 





CLK 


‘ ‘ 
‘ ‘ 
‘ 


INSTRUCTION 
LINES, S/DR, 


| 


Fo -F3, 
FLAGS, SIGN 


operands, respectively. 
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Timing of Operations with Input Mode 2 
(32-Bit Bus, Single-Cycle, MSW First)* 


*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
HIGH-to-LOW clock transition. 


At 1, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the most- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. Both words are loaded on the 
HIGH-to-LOW transition of the clock. 


At 2, the least-significant 32 bits of the R operand are loaded 
from the R-input port into the least-significant half of the R 


register, and the least-significant 32 bits of the S operand are 
loaded from the S-input port into the least-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the R register, and the 
output of the S-temp register is loaded into the most- 
significant half of the S register. 


If an input operand is single-precision, the 32-bit data is kept 
on the input bus for the full cycle. 
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32-Bit Bus, Double-Cycle, LSW First (M16 =0, M15 = 1, 
M14 = 0) 


In this mode, the two halves of the 64-bit R operand are 
placed on the R-input bus in successive cycles, with the S 





INSTRUCTION 
LINES, S/OR, 
S/DS 


ENA 
ENS 
EN 






Fo Fay 
FLAGS, SIGN 





operand similarly placed on the S-input port. After two cycles, 
the R and S registers contain the R and S operands, 
respectively. 
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Timing of Operations with Input Mode 3 
- (32-Bit Bus, Double-Cycle, LSW First)* 


In this mode, the temporary registers are clocked on every 
LOW-to-HIGH clock transition. 


At 1, the least-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the least- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. 


At 2, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the most-significant half of the R 


*Assumes flow-through operation, F register, and S register clocked. 





register, and the most-significant 32 bits of the S operand are 
loaded from the S-input port into the most-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the least-significant half of the R register, and the 
output of the S-temp register is loaded into the least- 
significant half of the S register. 
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32-Bit Bus, Double-Cycle, MSW First (M16 = 0, M15 = 1, 
M14 = 1) 


In this mode, the two halves of the 64-bit R operand are 
placed on the R-input bus in successive cycles, with the S 
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INSTRUCTION 
LINES, S/DR, 
S/DS 


ENR 
ENI 


Fo-Fay 
FLAGS, SIGN 


operand similarly placed on the S-input port. After two cycles, 
the R and S registers contain the R and S operands, 
respectively. 
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Timing of Operations with Input Mode 4 
(32-Bit Bus, Double-Cycle, MSW First)* 


*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
LOW-to-HIGH clock transition. 


At 1, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the most- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. 


At 2, the least-significant 32 bits of the R operand are loaded 
from the R-input port into the least-significant half of the R 


register, and the least-significant 32 bits of the S operand are 
loaded from the S-input port into the least-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the R register, and the 
output of the S-temp register is loaded into the most- 
significant half of the S register. 
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64-Bit Bus, Single-Cycle, R First (M16 = 1, M15=0, 
M14 = 0) | 


In this mode, the MSW of the 64-bit R operand is placed on 
the R-input bus and the LSW of the S-input bus. Both 
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In this mode, the temporary registers are clocked on every 
HIGH-to-LOW clock transition. 


At 1, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the least- 
significant 32 bits of the R operand are loaded from the S- 
input port into the S-temp register. 


At 2, the most-significant 32 bits of the S operand are loaded 
from the R-input port into the most-significant half of the S 





“Assumes flow-through operation, F register, and S register clocked. 


halfwords are loaded in the first half cycle. Similarly, the two 
halves of the S operand are loaded in the second half cycle. 


After one full cycle, the R and S registers contain the R and S 
operands, respectively. 
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Timing of Operations with Input Mode 5 
(64-Bit Bus, Single-Cycle, R_ First)* 
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register, and the least-significant 32 bits of the S operand are 
loaded from the S-input port into the least-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the R register, and the 
output of the S-temp register is loaded into the least- 
significant half of the R register. 
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64-Bit Bus, Single-Cycle, S First (M16 = 1, M15 =0, 
M14 = 1) 





In this mode, the MSW of the 64-bit S operand is placed on the 
R-input bus and the LSW on the S-input bus. Both halfwords 


are loaded in the first half cycle. Similarly, the two halves of 
the R operand are loaded in the second half cycle. After one 


full cycle, the R and S registers contain the R and S operands, 
respectively. 


INSTRUCTION 
LINES, S/DR, 
s/DS 


FLAGS, SIGN 


‘ 
‘ 
‘ 
‘ 
' 
' 
‘ 
\ 
‘ 
‘ 
' 
‘ 
' 
‘ 
' 
‘ 
' 
‘ 
‘ 
‘ 
' 
) 
‘ 
‘ 
‘ 
i) 
' 
\ 
‘ 
‘ 
‘ 
’ 
‘ 
' 
’ 
‘ 
' 
] 
‘ 


WF024930 


Timing of Operations with Input Mode 6 
(64-Bit Bus, Single-Cycle, S First)* 





*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
HIGH-to-LOW clock transition. 


At 1, the most-significant 32 bits of the S operand are loaded 
_ from the R-input port into the R-temp register, and the least- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. 


At 2, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the most-significant half of the R 
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register, and the least-significant 32 bits of the R operand are 
loaded from the S-input port into the least-significant half of 
the R register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the S register, and the 
output of the S-temp register is loaded into the least- 
significant half of the S register. 








64-Bit Bus, Double-Cycle, R First (M16 = 1, M15 = 1, 
M14 = 0) 


In this mode, the MSW of the 64-bit R operand is placed on 
the R-input bus and the LSW of the S-input bus. Both 
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halfwords are loaded in the first cycle. Similarly, the two halves 
of the S operand are loaded in the second cycle. After the two 
cycles, the R and S registers contain the R and S operands, 
respectively. 
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Timing of Operations with input Mode 7 
(64-Bit Bus, Double-Cycle, R First)* 


*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
LOW-to-HIGH clock transition. 


At 1, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the R-temp register, and the least- 
significant 32 bits of the R operand are loaded from the S- 
input port into the S-temp register. 


At 2, the most-significant 32 bits of the S operand are loaded 
from the R-input port into the most-significant half of the S 


register, and the least-significant 32 bits of the S operand are 
loaded from the S-input port into the least-significant half of 
the S register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the R register, and the 
output of the S-temp register is loaded into the least- 
significant half of the R register. 
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64-Bit Bus, Double-Cycle, S First (M16 = 1, M15 = 1, 
M14 = 1) | 


In this mode, the MSW of the 64-bit S operand is placed on the 
R-input bus and the LSW of the S-input bus. Both halfwords 


INSTRUCTION 
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FLAGS, SIGN 


are loaded in the first cycle. Similarly, the two halves of the R 
operand are loaded in the second cycle. After the two cycles, 
the R and S registers contain the R and S operands, 
respectively. 
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Timing of Operations with Input Mode 8 
(64-Bit Bus, Double-Cycle, S First)* 


*Assumes flow-through operation, F register, and S register clocked. 


In this mode, the temporary registers are clocked on every 
LOW-to-HIGH clock transition. 


At 1, the most-significant 32 bits of the S operand are loaded 
from the R-input port inot the R-temp register, and the least- 
significant 32 bits of the S operand are loaded from the S-input 
port into the S-temp register. 


At 2, the most-significant 32 bits of the R operand are loaded 
from the R-input port into the most-significant half of the R 





register, and the least-significant 32 bits of the R operand are 
loaded from the S-input port into the least-significant half of 
the R register. 


At the same time, at 2, the output of the R-temp register is 
loaded into the most-significant half of the S register, and the 
output of the S-temp register is loaded into the least- 


significant half of the S register. 
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Pipelining of Operations 


The floating-point ALU of the Am29C327 may be operated in 
one of three pipeline modes: . 


1. Flow-Through Mode 
2. Single-Pipelined Mode 
3. Double-Pipelined Mode 


Flow-Through Mode 


In this mode the floating-point ALU acts as a purely combina- 
torial device. 





Single-Pipelined Mode 


In this mode the floating-point ALU contains a single pipeline 
delay for all operations; throughput is roughly double that for 
unpipelined mode. Simplified diagrams for the ALU configura- 
tion for single-pipelined mode are shown in Figure 2. 


Doubie-Pipelined Mode 


In this mode, which applies only to the multiplication-accumu- 
lation operation, the ALU contains two pipeline delays; 
throughput is roughly triple that for the unpipelined multiplica- 
tion-accumulation operation. Simplified block diagrams are 
shown in Figure 3. 


Figures 4 and 5 provide timing diagrams for all operations 
except multiply-accumulate, illustrating flow-through mode and 
pipelined mode, respectively. Figures 6, 7, and 8 provide 
timing diagrams for multiply-accumulate, illustrating flow- 
through mode, single-pipelined mode, and double-pipelined 
mode, respectively. 


INSTRUCTION 
REGISTER 


, PIPELINE PIPELINE 
REGISTER ) REGISTER 


RENORMALIZE 


_wmaweeeweeerwe —-=—e ee w2eeewree we we we wweeVW5u0n we eweeereee we @ 


a) MULTIPLY-ACCUMULATE 





The choice of pipelining mode affects only the floating-point 
ALU. Operations of other parts of the Am29C327, such as the 
input registers, the output register, the mode register, and the 
instruction register are not affected by the choice of pipelining 
mode. However, the instruction bits are pipelined as they pass 
through the ALU. This permits instructions to be interleaved in 
pipelined mode. 


The desired pipeline mode or modes can be invoked by setting 
mode register bits M19 and M20 to the appropriate values. 


When using the Am29C327 in either single-pipelined or 
double-pipelined mode, two conditions must be observed: 


1. The "load mode register’’ instruction is not pipelined, nor 
are any of the mode register bits. When the mode register 
is loaded, any differences between the current mode and 
the previous mode take effect immediately. In single- 
pipelined mode, the user should separate the last valid 
ALU instruction and the "load mode register'' instruction 
with one ''NO-OP"' instruction. In double-pipelined mode, 
the user should separate them with two ''NO-OP" instruc- 
tions. A NO-OP instruction is any instruction whose result 
is not stored in register F, or the register file. 


. A multiplication-accumulation instruction cannot be imme- 
diately followed by any other type of instruction. This 
problem can be avoided by inserting a ''dummy'' multipli- 
cation-accumulation instruction at the end of a multiplica- 
tion-accumulation instruction. This ''dummy"’ is any in- 
struction whose results are not stored in register F or the 
register file. 
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Figure 2. ALU Configuration for Single-Pipelined Mode 
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a) MULTIPLY-ACCUMULATE 
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B) OTHER OPERATIONS 


Figure 3. ALU Configuration for Double-Pipelined Mode 
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Figure 4. Timing for All Operations EXCEPT Multiply-Accumulate, Flow-Through Mode 
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Figure 5. Timing for All Operations EXCEPT Multiply-Accumulate, Pipelined Mode 
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Figure 6. Timing for Multiply-Accumulate, Flow-Through Mode 
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Figure 7. Timing for Multiply-Accumulate, Single-Pipelined Mode 
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Figure 8. Timing for Multiply-Accumulate, Double-Pipelined Mode 
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instruction Set 
Instruction Register Format 


The 14-bit instruction word Ig -1143 comprises sign-change 
controls, integer/floating-point select bit, and the opcode. 


yo 444 — to lg Ig 17 Ig Ig lo Ih Io 


SIGN (P) “SIGN (Q) SIGN (T) SIGN (F) OPCODE 


The opcode field, 14-1, specifies the core operation to'be floating-point and integer formats. The core operations and 
performed by the ALU; instruction bit I5 selects between their corresponding opcodes are listed in Table 1. 


TABLE 1. CORE OPERATIONS/OPCODES 


OO). a 


p +T 
P*Q 

COMPARE P, T 

MAX P, T 

MIN P, T 

CONVERT T TO INTEGER 

SCALE T TO INTEGER BY Q 

(P*Q)+T 

ROUND T TO INTEGRAL VALUE 

RECIPROCAL SEED OF P 

CONVERT T TO ALTERNATE F.P. FORMAT 
CONVERT T FROM ALTERNATE F.P. FORMAT 


Cefele tell opening 


. +T 
P*Q 

COMPARE P, T 

MAX P, T 

MIN P, T 

CONVERT T TO FLOATING-POINT 
SCALE T TO FLOATING-POINT BY Q 

P OR T 

P AND T 

P XOR T 

SHIFT P LOGICAL Q PLACES 

SHIFT P ARITHMETIC Q PLACES 
FUNNEL SHIFT PT LOGICAL Q PLACES 


oooooocooooo0o°o°o 
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Core operations MOVE P and LOAD MODE REGISTER can both be performed in either floating-point or integer format: 


Pe [iT [aT [oe onomton 


MOVE P 
LOAD MODE REGISTER 






4-156 


Sign-Change Selects 


Each ALU input and output operand has associated hardware 
that can be used to modify operand signs (see Figure 9). 
These sign-change blocks, when applied to core operations, 
greatly increase the number of available operations. A core 
Operation of P+ T, for example, can be used to perform 
operations such as P-T, ABS(P + T), ABS(P) + ABS(T), and 
others, simply by modifying the signs of the input and output 
operands. 
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TABLE 2-1. SELECT DECODE FOR P OPERAND 
| SIGN-CHANGE BLOCK 


SIGN (P) 
SIGN (P) 
0 
1 


TABLE 2-3. SELECT DECODE FOR T OPERAND 
SIGN-CHANGE BLOCK 


pte | te | Sign (r) 
0 SIGN T 
SIGN T 

0 

1 


(Leer operation] mn [uo | | 
p 0 


Using the sign-change blocks, the sign of an input operand 
may be left unchanged, inverted, set to zero, or set to one; the 
sign of the output operand may be left unchanged, set to zero, 
set to one, set to the sign of the P input operand, or set to the 
sign of the T input operand. Select decodes for the P, Q, T, 
and F operand sign-change blocks are shown in Table 2-1, 2- 
2, 2-3, and 2-4, respectively. 
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TABLE 2-2. SELECT DECODE FOR Q OPERAND 
SIGN-CHANGE BLOCK 


SIGN (Q) 
SIGN (Q) 
0 


1 


TABLE 2-4. SELECT DECODE FOR F OPERAND 
SIGN-CHANGE BLOCK 


0 | SIGN (F’) 
SIGN (F') 
0 

1 : 
SIGN (P 
SIGN (T) 


Max P, T 
or 
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Operand Multipiexer Selects 


Instruction fields PSEL9-PSEL3, QSELp-QSEL3, and 
TSELo - TSELg specify the select codes for the P, Q, and T 





TABLE 3. OPERAND MULTIPLEXER SELECT CODES 
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Operand Precisions 


The Am29C327 supports mixed-precision operations, so that it 
is possible, for example, for an operation to have single- 
precision inputs and a double-precision output, or one single- 
and one double-precision input, or any other combination. 


Precision of the operands in registers R and S is specified by 
signals S/DR and S/DS. A logic HIGH indicates a single- 
precision operand or operands; a LOW, double precision. 


Precision of an operation result is specified by signal S/DF. A 
logic HIGH indicates a single-precision operand; a logic LOW, 
double-precision. 


Operands stored in the register file are each accompanied by 
a bit indicating that operand's precision; this precision informa- 














operand multiplexers, respectively; the codes are summarized | 
in Table 3. | 








p 





Q 
T 
R 
S 


O 
0.5 (Floating Point) 
-1 (Integer) 






Pi (Floating Point) 
Max Neg. Two's-Comp. Value (Integer) 
Register File Location 0 (RFO) 
Register File Location 1 (RF1) 
Register File Location 2 (RF2) 
Register File Location 3 (RF3) 
Register File Location 4 (RF4) 
Register File Location 5 (RF5) 
Register File Location 6 (RF6) 
Register File Location 7 (RF7) 






tion is automatically supplied to the ALU when a register file 
location is used as an input operand to an operation. 


Processor Operations 


Table 4 illustrates a number of possible ALU instructions 

comprising the opcode, integer/floating-point select, and sign- 

change fields. Note that the remaining instruction bits — P, Q, 

and T operand multiplexer selects; the rounding modes; and the 
- Output operand precision — can be specified independently. 


The user may create instructions using instruction words other 
than those listed in Table 4. For some core operations, sign- 
change control settings are completely arbitrary; for others, 
only the sign-change field values shown in Table 4 are valid. 
Table 5 summarizes permissible sign-change field values for 
each core operation. 
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TABLE 4. INSTRUCTION WORDS 


Operation 


P 
-P 

ABS (P) 

Sign (T)*ABS (P) 


P+T 

P-T 

T-P 

-P-T 

ABS (P + T) 

ABS (P -T) 

ABS (P) + ABS (T) 
ABS (P)- ABS (T) 
ABS (ABS (P)- ABS (T)) 
P*Q 

(-P) * Q 

ABS (P * Q) 
Compare P, T 


Max P, T 
Max ABS (P), ABS (T) 


Min P, T 
Min ABS (P), ABS (T) 


Limit P to Magnitude T 
Convert T to integer 
Scale T to Integer by Q 


T+P*Q 

T-P*Q 

-T+P*Q 

-T-P*Q 

ABS (T) + ABS (P*Q) 
ABS (T)- ABS (P*Q) 
ABS (P*Q)- ABS (T) 


Round T to Integral Value 
Reciprocal Seed (P) 


Convert T to Alternate 
Floating-point Format 


Convert T from Alternate 
Floating-point Format 
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_ TABLE 4. INSTRUCTION WORDS (Cont'd.) 
| | | sign Sign | | 
Operation ppftoalr|e | VF 





P 

-P 

ABS (P) 

sign (T)*ABS (P) 


P+T 
P-T 
T-P 
ABS (P +T) 
ABS (P ~T) 


P*Q 

Compare P, T 

Max P, T 

Min P, T 

Convert T to Float 
Scale T to Float by Q 
P OR T 

P AND T 


P XOR T 
NOT T (see Note 1) 


Shift P Logical Q Places 
Shift P Arithmetic Q Places 
Funnel Shift PT Q Places 
Move P 


Se ee eS a 


ee ee ee ee ee + 


ee eo 


Load Mode Register 


Notes: 1. NOT T is performed by XORing T with a word containing all 1's (integer - 1). When invoking NOT T the 
user must set PSEL3-PSELo to 00119, thus selecting integer constant - 1. 
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TABLE 5. ALLOWABLE SIGN-CHANGE/CORE-OPERATION COMBINATIONS 


1 tittl 
5 43210 


eoooo0oo°0o°oo°oo 


1 00000 
1 00001 
1 00010 
1 00011 
1 00100 
1 00101 
1 00110 
1 00111 
1 10000 
1 10001 
1 10010 
1 10011 
1 10100 
1 10101 





Core Operation 


FP P 

FP P+T 

FP P*Q 

FP Compare P, T 
FP Max P, T 

FP Min P, T 

FP Cvt T to Int 

FP Scale T to Int 
FP P*Q+T 

FP Round T 

FP Recip Seed P 
FP Cvt T to Alt Fmt 
FP Cvt T fm Alt Fmt 


Int P 

Int P+T 

Int P*Q 

Int Compare P, T 
Int Max P, T 

Int Min P, T 

Int Cvt T to f.p. 
Int Scale T to f.p. 
Int P OR T 

Int P AND T 

Int P XOR T 

Int Shift P Logical 
Int Shift P Arith 
Int Funnel Shift PT 


| Stgn-change Fields 
[sign (®) | sign (@) | sign | ign F) | 


m7 1x x xxx TNMANNANI x x Mx xXx nA .N<<< 





Sign-Change Fields 





mxX<x x xX X K TMM NnHx HAniIinnx n<eTnTtTahN.! NAN NxK <x 
m4MnMx<x xX x THATrAARAAD AA HIAANAnNHANh< AAA Ah<K << 


mm nx <x xX Hx THMxX NK NIK xx x COMxK HMTNxK <x < 


1 x 11000 Move P X X x Xx 
x 11111 Load Mode Reg X Xx X x 


Key: V = Variable; user can specify arbitrary sign change. 
F = Fixed; user is restricted to sign change combinations shown in Table 4. 
x = Don't care; this field does not affect the operation or its result. 


Descriptions of Operations 


P (Floating-Point or Integer): The operand on port P is 
passed through the ALU to port F. This operation may be used 
to change the precision of an operand, negate an operand, 
extract the absolute value of an operand, or transfer the sign 
of operand T to operand P. 


P+T (Floating-Point or Integer): The addition operation 
(P + T) adds the operands on ports P and T, and places the 
result on port F. 


P*Q (Floating-Point or Integer): The multiplication operation 
(P*Q) multiplies the operands on ports P and Q, and places 
the result on port F. 


COMPARE P, T (Fioating-Point or Integer): This operation 
compares the operands on ports P and T, and places (P - T) 
on port F. One of four comparison flags (=, >, <, #) is set 
according to the result of the comparison. Note that the 
unordered flag (#) can be set only when the format selected 
is IEEE or DEC. 


MAX P, T (Floating-Point or Integer): This operation selects 
the most positive of the two operands on ports P and T, and 
places the result on port F. 


MIN P, T (Floating-Point or-integer): This operation selects 
the most negative of the two operands on ports P and T, and 
places the result on port F. 


LIMIT P TO MAGNITUDE T (Floating-Point): This operation 
imposes a clipping or saturation level on operand P by 
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. comparing the magnitudes of the operands on ports P and T. If 


operand P has the smaller magnitude, it is placed on port F; if 
operand T has the smaller magnitude, it is placed on port F, 
but with its sign modified to agree with that of operand P. This 
operation is equivalent to operation SIGN(P) * MIN( ABS(P), 
ABS(T) ). 


CONVERT T TO INTEGER (Floating-Point): The floating- 
point-to-integer conversion operation takes a floating-point 
operand on port T and places the equivalent two's-comple- 
ment integer value on port F. 


CONVERT T TO FLOATING-POINT (Integer): The integer- 
to-floating-point conversion operation takes a two's-comple- 
ment integer operand on port T and places the equivalent 
floating-point value on port F. 


SCALE T TO INTEGER BY Q (Floating-Point): This opera- 
tion converts the floating-point operand T to integer format 
using the floating-point operand Q as a scale factor. The true 
exponent of Q is added to the true exponent of T before the 
new value T is converted to integer format. The operation 
therefore permits T to be multiplied by any power of two when 
the source format is IEEE or DEC, and by any power of 16 
when the source format is IBM. 


SCALE T TO FLOATING-POINT BY Q (Integer): This opera- 
tion converts the integer operand T to floating-point format 
using the operand Q as a scale factor, where Q is a floating- 
point operand in the destination format. The true exponent of 
Q is added to the true exponent of T after T has been 
converted from integer to floating-point. The operation 
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therefore permits T to be scaled by any multiple of two when 
the destination format is IEEE or DEC, and by any multiple of 
16 when the destination format is IBM. 


(P*Q) + T (Floating-Point): This operation multiplies the oper- 
ands on port P and Q, adds the product to the operand on port 
T, and places the result on port F. 


ROUND T TO INTEGRAL VALUE (Floating-Point): This 
operation rounds a floating-point operand to an integer-valued 
floating-point operand of the same format. A value of 3.5, for 
example, would be rounded to either 3.0 or 4.0, the choice 
depending on the rounding mode. 


RECIPROCAL SEED OF P (Floating-Point): The reciprocal 
seed of the floating-point operand on port P is placed on port 
F; the result obtained is a crude estimate of the input 
operand's reciprocal. This operation can be used as the initial 
step in performing Newton-Raphson division. A single-preci- 
sion result is obtained after five iterations, and a double- 
precision result after six iterations. Alternately, an external 
seed look-up table can be used for faster convergence. The 
result obtained through iteration is approximate. 


CONVERT T TO ALTERNATE FLOATING-POINT FORMAT 
(Floating-Point): This operation converts operand T from the 
primary floating-point format to the alternate floating-point 
format, thus allowing conversions among the IEEE, DEC, and 
IBM floating-point formats. 


CONVERT T FROM ALTERNATE FLOATING-POINT FOR- 
MAT (Floating-Point): This operation converts operand T 
from the alternate floating-point format to the primary floating- 
point format, in a manner similar to that of CONVERT T TO 
ALTERNATE FLOATING-POINT FORMAT above. 


P OR T, P AND T, P XOR T, NOT T (Integer): The logical 
operations (OR, AND, EXCLUSIVE OR) are performed on the 
operands on ports P and T, and the result is placed on port F. 
NOT T is performed by XORing T with a word containing ail 
ones (integer -—1). When invoking NOT T, instruction bits 
PSEL3 -PSELo must be set to 0011, thus selecting integer 
constant —1. 


SHIFT P LOGICAL Q PLACES (Integer): This operation 
logically shifts operand P by Q places. If the shift is Q places to 
the right, Q zeros are filled from the left. If the shift is Q places 
to the left, Q zeros are filled from the right. 


SHIFT P ARITHMETIC Q PLACES (Integer): This operation 
arithmetically shifts operand P by Q places. With a right shift, _ 
the result is sign extended Q places. With a left shift, Q zeros 
are filled from the right. 


FUNNEL SHIFT PT LOGICAL Q PLACES (Integer): The 
operands on ports P and T are concatenated to form a double- 
width operand PT, which is then shifted to the right or left by Q 
places; the 32- or 64-bit result is placed on port F. 


MOVE P (Floating-Point or Integer): The operand on port P 
is moved to port F. The operand is left unchanged, and only 
the sign flag is set. 


Operation Flags 


For each operation, the ALU produces thirteen flags that 
indicate operation status. Of the flags produced, a maximum 
of seven are relevant to any given operation. The relevant 
flags are placed in the status register, and the other flags are 
discarded. 


The ALU flags are: 


C—CARRY: Carry-out bit produced by integer addition, 
subtraction, or comparison. 


|— INVALID OPERATION: Input operands are unsuitable for 
the operation specified (e.g., °° * 0). 

R-—RESERVED OPERAND: Reserved operand detected/ 
generated. ; 

S — SIGN: Result sign. 


U — UNDERFLOW: Result underflowed the destination for- 
mat. : 


V — OVERFLOW: Result overflowed the destination format. 


W — WINNER: Indicates which of the two operands selected 
when performing Max/Min operations. 


X — INEXACT RESULT: Result had to be rounded to fit the 


destination format. 


Z— ZERO: Zero result. 


>, =, <, #—GREATER THAN, EQUAL, LESS THAN, 
UNORDERED: Used to report the result of a comparison 
operation. 


Table 6 lists the flags reported for each operation. 
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Integer 
Integer 
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Integer 
Integer 
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Note: 


TABLE 6. ORGANIZATION OF FLAGS 


Operations 


Non-arithmetic single-operand 
Operations using add 
Operations using multiply 
Compare 

Maximum, minimum, limit 
Convert/scale to integer 
Multiply/accumulate 

Round to integral value 
Reciprocal seed 

Convert to alt. f.p. format 
Convert from alt. f.p. format 


Non-arithmetic single-operand 
Operations using add 
Operations using multiply 
Compare 

Maximum, minimum, limit 
Convert/scale to integer 
Multiply/accumulate 

Round to integral value 
Reciprocal seed 

Convert to alt. f.p. format 
Convert from alt. f.p. format 


Non-arithmetic single-operand 
Operations using add 
Operations using multiply 
Compare 

Maximum, minimum, limit 
Convert/scaie to integer 
Multiply/accumulate 

Round to integral value 
Reciprocal seed 

Convert to alt. f.p. format 
Convert from ait. f.p. format 


Non-arithmetic single-operand 
Operations using add 
Operations using multiply 
Compare 

Maximum, minimum, limit 
Convert/scale to integer 
Multiply/accumulate 

Round to integral value 
Reciprocal seed 


_ Convert to alt. f.p. format 


Convert from alt. f.p. format 





Non-arithmetic single-operand 
Sign transfer 

Operations using add 
Operations using multiply 
Compare operations 
Maximum, minimum, limit 
Convert to float 

Scale to float 

Logical operations 
Arithmetic shift 

Funnel shift 


| Move operand 11000 | S | 
Load mode register 41111 


Unused flags assume the LOW state. 
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Master/Slave Operation 


Two Am29C327 processors can be tied together in master/ 
slave configuration, with the slave checking the results pro- 
duced by the master. All input and output signals of the slave, 
with the exception of SLAVE and MSERR, are tied to the 
corresponding signals of the master. The master is selected 
by asserting signal SLAVE LOW; the slave, by asserting signal 
SLAVE HIGH. 








Le 


The slave processor, by comparing its outputs to the outputs 
of the master processor, performs a comprehensive check of 
the operation of the master processor. In addition, the slave 
processor may detect open circuits and other faults in the 
electrical path between the master processor and the system. 
Note that the master processor still performs the comparison 
between its outputs and its own internally generated results, 
and is therefore able to detect faults in its output drivers. 


4-164 


APPENDICES 


APPENDIX A— DATA FORMATS 


The following data formats are supported: 32-bit integer, 64-bit 
integer, IEEE single-precision, IEEE double-precision, DEC F, 
DEC D, DEC G, IBM single-precision, and IBM double- 
precision. 


Integer Formats 
32-Bit Integer 


The 32-bit integer word is arranged as follows: 


Bit 31 30 29 28 27 26 25. .» 


031 930 229 228 92/7 226 925 


The 32-bit word is interpreted as a two's-complement integer. 
For integer multiplications, the user has the option of interpret- 
ing integers as unsigned. An unsigned single-precision integer 


64-Bit Integer 


The 64-bit integer word is arranged as follows: 


Bit 63 62 61 60 59 58 57 « « 


963 562 561 60 559 458 557 


The 64-bit word is interpreted as a two's-complement integer. 
For integer multiplications, the user has the option of interpret- 
ing integers as unsigned. An unsigned double-precision inte- 


IEEE Formats 
IEEE Single-Precision 


The IEEE single-precision word is 32 bits wide and is arranged 
in the format as follows: 


31 —— ees 22 21 


biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, an 8-bit biased exponent, and a 23-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers. Zero may have either sign. 


The biased exponent is an 8-bit unsigned integer representing 
a multiplicative factor of some power of two. The bias value is 
127. lf, for example, the multiplicative value for a floating-point 

If e= 255 and f#0 value = NaN 
lf e= 255 and f=0 
lf O<e< 255 


The primary and alternate floating-point formats are selected 
by mode register bits MO to M3. The user may select between 
floating-point operations and integer operations by means of 
instruction bit 15. 


The nine supported formats are described below: 


eee FG 43-35 0 
. 2! 2805 94 93 92 91 90 


TBO01030 


has a format similar to that of ie two's-complement integer, 
but with an MSB weight of 231 


~*~ 26 « 7654321 «0 


, 27 28 95 9493 92 91 20 


TB001040 


ger has a format similar to that of ine two's-complement 
integer, but with an MSB weight of 26 


20521 nee 9°23 


fraction (f) 
TB001050 


number is to be 2%, the value of the biased exponent is 
a+127, where ''a'' is the true exponent. 


The fraction is a 23-bit unsigned fractional field containing the 
23 least-significant bits of the floating-point number's 24-bit 
mantis: The weight of the fraction's ASC ae bit is 
2-1. The weight of the least-significant bit is 2-2 


An JEEE floating-point number is evaluated or interpreted as 
follows: 


Not-a-Number 


value = (-1)S0o Infinity 

value = (- 1)82°- 127(1.f) Normalized number 
lf e=0 and f#0 value = (- 1)S2- 126 (o,f) 
lf e=0O and f=0 value = (-1)50 


Denormalized number 
zero 
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Infinity: Infinity can have either a positive or negative sign. 
The interpretation of infinities is determined by the Affine/ 
Projective select input AFF/PROJ. 





NaN: A NaN is interpreted as a signal or symbol. NaNs are 
used to indicate invalid operations, and as a means of passing 
process status through a series of calculations. They arise in 








IEEE Double-Precision 






The IEEE double-precision word is 64 bits wide and is 
arranged in the format shown below: 


638 62 61 60 .- - &4 5&3 52 51 











sign biased exponent (é) 





The floating-point word is divided into three fields: a single-bit 
sign, an 11-bit biased exponent, and a 52-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; zero may have either sign. 


The biased exponent is an 11-bit unsigned integer represent- 
ing a multiplicative factor of some power of two. The bias 
value is 1023. If, for example, the multiplicative value for a 





50 49 48 47 


two ways: either generated by the Am29C327 to indicate an 
invalid operation, or provided by the user as an input. A 
signaling NaN has the MSB of its fraction set to 0 and at least 
one of the remaining fraction bits set to 1. A quiet NaN has the 
MSB of its fraction set to 1. 


The IEEE format is fully described in IEEE Standard 754. 





502 


349 350 551 5 


2" © 8 2-2 2 
fraction (f) 









TBO001060 


floating-point number is to be 2%, the value of the biased 
exponent is a+ 1023, where ''a'’ is the true exponent. 


The fraction is a 52-bit unsigned fractional field containing the 
52 least-significant bits of the floating-point number's 53-bit 
mantissa. The weight of the fraction's most-significant bit is 
2-'. The weight of the least-significant bit is 752. 


An IEEE floating-point number is evaluated or interpreted as 
follows: 





If e= 2047 and f#0....... value = Reserved operand Not-a-Number 

lf e= 2047 and f=0....... value = (= 1)Se0 Infinity | 

If 0<@< 2047 oe, value = (- 1)82@- 1923(4 f) Normalized number _ 
lf e=O and f#0........... value = (~1)82- 19229 f) ~~ Denormalized number 
lf e=O and f=0.......... .. value = (- 1)80 Zero 





Infinity: Infinity can have either a positive or negative sign. 
The interpretation of infinities is determined by the Affine/ 
Projective select input AFF/PROJ. 





NaN: A NaN is interpreted as a signal or symbol. NaNs are 
used to indicate invalid operations, and as a means of passing 
process status through a series of calculations. They arise in 


DEC Formats 


DEC F 


The DEC F word is 32 bits wide and is arranged in the format 


shown below: 


31 30 2 28 27 2 25 24 23 









two ways: either generated by the Am29C327 to indicate an 
invalid operation, or provided by the user as an input. A 
signaling NaN has the MSB of its fraction set to 0 and at least 
one of the remaining fraction bits set to 1. A quiet NaN has the 
MSB of its fraction set to 1. 


The IEEE format is fully described in IEEE Standard 754. 


19 18 --+ + 3 2 1 ~«0 


521 522 523 ge4 


227° 2 





sign biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, an 8-bit biased exponent, and a 23-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; zero has a positive sign. 


The biased exponent is an 8-bit unsigned integer representing 
a multiplicative factor of some power of two. The bias value is 
128. If, for example, the multiplicative value for a floating-point 
number is to be 2%, the value of the biased exponent is 
a+ 128, where ''a' is the true exponent. 


The fraction is a 23-bit unsigned fractional field containing the 
23 least-significant bits of the floating-point number's 24-bit 
mantissa. The weight of the fraction's most-significant bit is 
2-2. The weight of the least-significant bit is g7 24 


fraction (f) 
TB001070 


A DEC F floating-point number is evaluated or interpreted as 
follows: 


A al ee value # (- 1)82® ~ 1289, 44) 
lf s=O and e=0......... value = 0 
lf s=1 and e=0......... value = DEC-Reserved Operand 


DEC-Reserved Operand: A DEC-Reserved Operand is inter- 
preted as a signal or symbol. DEC-Reserved Operands are 
used to indicate invalid operations and operations whose 
results have overflowed the destination format. They may also 
be used to pass symbolic information from one calculation to 
another. 


The DEC formats are fully described in the VAX Architecture 
Manual. 
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DEC D 


The DEC D word is 64 bits wide and is arranged in the format 
shown below: 


63 62 61 60 50 58 57 56 55 
fs. a! 28 95 of 93 52 9! 20 


sign biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, an 8-bit biased exponent, and a 55-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; zero has a positive sign. 


The biased exponent is an 8-bit unsigned integer representing 
a multiplicative factor of some power of two. The bias value is 
128. If, for example, the multiplicative value for a floating-point 
number is to be 2%, the value of the biased exponent is 
a+ 128, where ''a"' is the true exponent. 


The fraction is a 55-bit unsigned fractional field containing the 
55 least-significant bits of the floating-point number's 56-bit 
mantissa. The weight of the fraction's most-significant bit is 
2-2 The weight of the least-significant bit is 27°. 


DEC G 


The DEC G word is 64 bits wide and is arranged in the format 
shown below: 


54 53 52 51 50 


593 504 599 506 


fraction (f) 
TB001080 


A DEC D floating-point number is evaluated or interpreted as 
follows: 


value = (~ 1)82® - 128(9, 4f) 
value = DEC-Reserved Operand 


DEC-Reserved Operand: A DEC-Reserved Operand is inter- 
preted as a signal or symbol. DEC-Reserved Operands are 
used to indicate invalid operations and operations whose 
results have overflowed the destination format. They may also 
be used to pass symbolic information from one calculation to 
another. 


The DEC formats are fully described in the VAX Architecture 
Manual. 


638 62 6160 - . 54 5 52 51.50 49 48 47 


sign biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, an 11-bit biased exponent, and a 52-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; zero has a positive sign. 


The biased exponent is an 11-bit unsigned integer represent- 
ing a multiplicative factor of some power of two. The bias 
value is 1024. If, for example, the multiplicative value for a 
floating-point number is to be 2%, the value of the biased 
exponent is a+ 1024, where ''a'' is the true exponent. 


The fraction is a 52-bit unsigned fractional field containing the 
52 least-significant bits of the floating-point number's 53-bit 
mantissa. The weight of the fraction's most-significant bit is 
2-2 The weight of the least-significant bit is 2753. 


fraction (f) 
TB001090 


A DEC G floating-point number is evaluated or interpreted as 
follows: 
value = (~ 1)§2° ~ 1024/9 4f) 
lf s=0 and e=0 


lf s=1 and e=0 value = DEC-Reserved Operand 


DEC-Reserved Operand: A DEC-Reserved Operand is inter- 
preted as a signal or symbol. DEC-Reserved Operands are 
used to indicate invalid operations and operations whose 
results have overflowed the destination format. They may also 
be used to pass symbolic information from one calculation to 
another. 


The DEC formats are fully described in the VAX Architecture 
Manual. 
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“IBM Formats 
IBM Single-Precision 





The IBM single-precision word is 32 bits wide and is arranged 
in the format shown below: 


31. 30 2 2 27 2 2 24 


23 22 21 20 19 18 





| 2® 25 of 93 92 a! 2 


sign biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, a 7-bit biased exponent, and a. 24-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; a True-zero has a positive sign. 


The biased exponent is a 7-bit unsigned integer representing a 
multiplicative factor of some power of 16. The bias value is 64. 
If, for example, the multiplicative value for a floating-point 
number is to be 16%, the value of the biased exponent is 
a+64, where ''a'' is the true exponent. 


The fraction is a 24-bit unsigned fractional field containing the 
24 least-significant bits of the floating-point number's 25-bit 
mantissa. The weight of the fraction'’s most-significant bit is 
2-'. The weight of the least-significant bit is 2724. 


IBM Doubie-Precision 


The IBM double-precision word is 64 bits wide and is arranged 
in the format shown below: 


fraction (f) 
TB001100 


An |BM floating-point number is evaluated or interpreted as 
follows: 


value = (- 1)$16° ~ ©4(0,f) 


Zero: There are two possible classes of representations for 
zero. Since there is no leading bit in the IBM format, the range 
of the IBM fraction is equal to or greater than zero and less 
than one. If an operation causes the fraction of the result to 
cancel exactly, then the result is a floating-point zero. A True- 
zero has a positive sign, a biased exponent of zero, and a 
fraction of zero. 


The IBM format is fully described in the IBM System/370 
Principles of Operation Manual. 





63 62 61 60 59 58 57 56 55 54 53 52 51 50 ete 3 2 1 #0 





| 2® 9° o4 » 


sign biased exponent (e) 


The floating-point word is divided into three fields: a single-bit 
sign, a 7-bit biased exponent, and a 56-bit fraction. 


The sign bit is 0 for positive numbers and 1 for negative 
numbers; a True-zero has a positive sign. 


The biased exponent is a 7-bit unsigned integer representing a 
multiplicative factor of some power of 16. The bias value is 64. 
lf, for example, the multiplicative value for a floating-point 
number is to be 16%, the value of the biased exponent is 
a +64, where ''a'' is the true exponent. 


The fraction is a 56-bit unsigned fractional field containing the 
56 least-significant bits of the floating-point number's 57-bit 
mantissa. The weight of the fraction's most-significant bit is 
2-1. The weight of the least-significant bit is 275°, 


256 | 


503 504 355 6 


2-2 2 
“fraction (f) 
TB001110 


An IBM floating-point number is evaluated or interpreted as 
follows: 


value = (~1)§162 ~ ©4(0.4 


Zero: There are two possible classes of representations for 
zero. Since there is no leading bit in the IBM format, the range 
of the IBM fraction is equal to or greater than zero and less 
than one. If an operation causes the fraction of the result to 
cancel exactly, then the result is a floating-point zero. A True- - 
zero has a positive sign, a biased exponent of zero, and a 
fraction of zero. 


The IBM format is fully described in the IBM System/370 
Principles of Operation Manual. 





APPENDIX B— ROUNDING MODES 


The Am29C327 provides six rounding modes for floating-point 
operations, and for integer multiplication: 


Round to Nearest (IEEE) 
Round to Minus Infinity 
Round to Plus Infinity 
Round to Zero 

Round to Nearest (DEC) 
Round Away From Zero 
iNlegal Value 





Round to Nearest IEEE (Unbiased) 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format. If the 
infinitely precise result is exactly halfway between two repre- 
sentations, it is rounded to the representation having a least- 
significant bit of zero. This rounding mode conforms to the 
"round to nearest'’ mode described in the IEEE Floating-Point 
Standard. 


Round to Minus Infinity 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format that is 
less than or equal to the infinitely precise result. This rounding 
mode conforms to the "round to minus infinity'' mode de- 
scribed in the IEEE Floating-Point Standard. 


Round to Plus Infinity 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format that is 
greater than or equal to the infinitely precise result. This round 
mode conforms to the ''round to plus infinity’ mode described 
in the IEEE Floating-Point Standard. 


Round to Zero 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format whose 
magnitude Is less than or equal to the infinitely precise result. 
This rounding mode conforms to the "round to zero'’ mode 
described in the IEEE Floating-Point Standard. 


Round to Nearest DEC (Biased) 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format. If the 
infinitely precise result is exactly halfway between two repre- 
sentations, it is rounded to the representation having the 
greater magnitude. This rounding mode is used by DEC VAX 
computers. 


Round Away from Zero 


The infinitely precise result of an operation is rounded to the 
closest representable value in the destination format whose 
magnitude is greater than or equal io the infinitely precise 
result. 


A graphical representation of these rounding modes is shown 
in Figures B1-1 and B1-2. 
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Figure B1-1. Graphical Interpretation of IEEE Round-to-Nearest, .Round-to-Minus-Infinity, and Round-to-Plus-infinity Rounding Modes 
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Figure B1-2. Graphical Interpretation of Round-to-Zero, DEC Round-to-Nearest, and Round-Away-from-Zero Rounding Modes 











APPENDIX C — ADDITIONAL OPERATION 
DETAILS 


Differences Between IEEE Floating-Point. 
Standard and Am29C327 IEEE Operation 


The IEEE floating-point standard recommends that a trapped 
overflow on conversion from a binary format return a result in 
that or a wider format, rounded to the destination format. The 
Am29C327 returns an operand in the destination format, 


Differences Between IBM 370 Floating-Point 
Arithmetic and Am29C327 IBM Operation 


For all arithmetic operations, the Am29C327 in general will 
produce a more precise result than the IBM 370. 


Differences Between DEC Floating-Point 
Arithmetic and Am29C327 DEC Operation 


The Am29C327 and DEC VAX floating-point formats contain 
identical information, but the sub-fields of the floating-point 
words are arranged differently: 


The Am29C327 DEC F format is: 
sign — bit 31 
exponent — bits 30 - 23 
mantissa — bits 22-0 


The Am29C327 DEC D format is: 
sign — bit 63 
exponent - bits 62 - 55 
mantissa — bits 54-0 


rounded to that format. Note that trapped operation is an 
optional aspect of the IEEE floating-point standard, and as 
such, is not necessary for compliance. 


The VAX format is: 
sign - bit 15 
exponent - bits 14-7 
mantissa—bits 6-0, . 
bits 31-16 


The VAX format is: 
sign — bit 15 

exponent - bits 14-7 

mantissa — bits 6-0, 
bits 31 — 16, 
bits 47 - 32, 
bits 63 - 48 
bit 6 = MSB, 
bit 48 = LSB 
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ABSOLUTE MAXIMUM RATINGS 











OPERATING RANGES 





Storage Temperature ...............cccc cece eee ee -65 to +150°C Commercial (C) Devices 
Ambient Temperature (Ta) Temperature (TA) ........ccccceeceeseeneeeeeeeaees 0 to +70°C 
INGER Bias ax nits erat eet ach nartaennons -55 to +125°C Supply Voltage (VCC) ........ccceeeeseeeeeeee ees +5 V + 5% 
Supply Voltage to MUP score tisiaialecase teh otciadaatincutaesieaestaeseene +4.75 V 
Ground Potential Continuous ............... -0.5 to +7.0 V MA acck dost datemdwtadeden tr netoseumeteisercaa id +5.25 V 
DC Voltage Applied to a 
Outputs for HIGH State ................ -0.5 V to +Vcc Max. eat AN) oe vices s 
Temperature (TA)..........:ccccceeeeeeeee eens -~55 to +125°C 
DC Input Voltage ............cc cece e eee e ee ees -0.5 to +5.5 V Supply Voltage (Vcc) +5 V + 10% 
DC Output Current, Into Outputs......................08. 30 mA PPly g Ol nvsenranes ara teers ne 
WA sextet eee oa sation accel eate eeeae +45 V 
DC Input Current .................. ccc eee eee ees -~10 to +10 mA Nay 














Stresses above those listed under ABSOLUTE MAXIMUM 
RATINGS may cause permanent device failure. Functionality 
at or above these limits is not implied. Exposure to absolute 
maximum ratings for extended periods may affect device 
reliability. 









limits between which the 
guaranteed. 


Operating ranges defin 
functionality of the dey, 





DC CHARACTERISTICS over operating range unless otherwise specif 





Output HIGH Voltage 


pe] fe 


Voc = Max 
VIN = 5.5 V 


IgH = -Q:4 mA 
Min. 
‘IN = Vir Or Vi 
VoL Output LOW Voltage =0.8 V 0.5 V 
=2.0 V 
OL = 4.0 mA 
“| Guaranteed Input Logical- 
VIH Input HIGH Level HIGH Voltage for All Inputs an an ea 
. Guaranteed input Logical-LOW 
Voc = Min. 7 
Voc = Max. s 
Voc = Max. ; 
NH Vin = 2.4 V 


OZH Voc = Max. o- HA 





7 woo Max. 
Isc ee : : 
(Note 2) Output Short-Circuit Current =OV - ~30 mA 
An Outputs 





ge 3) Power Supply Current 


Iccat : COM’ L 
(Note 4) Quiescent Power Supply Current 


icca2 ; 
(Note 5) Quiescent Power Supply Current 


COM'L 
COM'L | 


NI 


. For conditions shown as Min. or Max., use the appropriate value specified under Electrical Characteristics for the applicable device type. 

. Not more than one output should be shorted at a time. Duration of the short-circuit test should not exceed one second. 

. loc is measured with clock frequency = 8 MHz and with outputs disabled. Inputs should be presented with random logic-HIGHs and 
LOWs to assure the toggling of internal nodes. 

Vin ~ Vis VIN S Vit 

. Vin 2 Voc - 0.2 V, Vin S 0.2 V 
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SWITCHING CHARACTERISTICS over operating range unless otherwise specified 


Parameter Description Test Conditions | Min. =| Max. | 


CLK Period (Note 1) 
Flow-Through Mode 










Multiply-Accumulate DC ns 

All Other Operations DC ns 
Single-Pipelined Mode 

Multiply-Accumulate DC ns 

All Other Operations DC ns 


Double-Pipelined Mode 
Multiply-Accumulate 


CLK LOW Time 
CLK HIGH Time 
4 CLK Rise Time 


3 
7) 





| 


Fo 31 CLK-to-Output-Valid 
F Register Clocked 
FLAG, —6 SIGN CLK-to-Output-Vatid 
Register Clocked 
Fo 31 CLK-to-Output-Valid 
F Register Transparent 
Flow-Through Mode 


Multiply-Accumulate 
All Other Operations 


Single-Pipelined Mode 
Multiply-Accumulate 
All Other Operations 


Double-Pipelined Mode 
Multiply-Accumulate 


FLAG,—~6 SIGN 
CLK-to-Output-Valid 
S Register Transparent 


Flow-Through Mode 


Multiply-Accumulate 
All Other Operations 


Single-Pipelined Mode 
Muiltiply-Accumulate 
All Other Operations . 


Double-Pipelined Mo 
Multiply-Accumulate 


15 


oa 


o 





wo 

















— 
_ 
nO 
oO 


oO 


NO 
oO 





ns 
ns 


ns 
ns 





ns 









35 35 
nO” nn 


S 
n 






= = 
(ee) ine) 









14 GEF, OES, Disab! 15 
HIGH to Z 
15 OEF, OES, Disable Time 


16 OEF, OES, Enable Time 
Z to HIGH 
17 OEF, OES, Disable Time 20 ns 
Z to LOW 
Pe | m8 


| 
| 


18 FSEL to Fo-34 
19 MSERR Data-to-Valid Delay . 


Notes: 1. CLK switching characteristics are made relative to 2.5 V. 
2. CLK rise time and fall time measured between 0.8 V and (Vcc-—1.0 V). 
3. Data/Instruction signals include Ro—31, So-31, S/DR, S/DS, S/DF, RMo—2, PSELo~3, QSELg~3, TSELo-~3 and 10-13. 


4. Control signals include ENR, ENS, ENF, ENRF, RFSELo-», FSEL, ENI, OEF, and OES. 
Conditions: A. All inputs/outputs except CLK are TTL-compatible for VjH, Vit, and Voy. 


B. All outputs are driving 80 pF unless otherwise noted. 
C. All setup, hold, and delay times are measured relative to CLK at Vcc/2 volts unless otherwise noted. 
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SWITCHING TEST CIRCUITS 


Vec 


Voc 





TC000421 





TCRO1331 
A. Three-State Outputs B. Normal Outputs 


Notes: 1. C_ = 50 pF includes scope probe, wiring, and stray capacitances without device in test fixture. 
2. S1, Sz, S3 are closed during function tests and all AC tests except output enable tests. 
3. S; and Sg are closed while So is open for tpz}, test. 
4. C_ = 5.0 pF for output disable tests. 
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SWITCHING TEST WAVEFORMS 





3V 
DATA 
INPUT Ley. 
ov 
eee 
th 
-3V 
TIMING 
INPUT ——_ } ——— 15 V 
ov 
. WFR02970 


Notes: 1. Diagram shown for HIGH data only. Output 
transition may be opposite sense. 
2. Cross-hatched area is don't care condition. 


Setup, Hold, and Release Times 





seovaptammnnrncesmmennasncamen 3 V 
SAME PHASE ___ 
INPUT TRANSITION 
PL a: 


me 15 V 


OPPOSITE PHASE __ 
INPUT TRANSITION 
een 0 V 


WFR02980 


7 








OuTPUT 


PLH 


' 





Propagation Delay 







LOW HIGH-LOW 
PULSE 





HIGH-LOW-HIGH | 
PULSE” 15 V 


WFRO02790 


Pulse Width 


3V 
CONTROL 
INPUT ery 
0 


OUTPUT 
NORMALLY 





yz 
VOH 
OUTPUT 
NORMALLY ~15V 
HIGH " So OPEN 
-~OV 


WFRO02660 





Notes: 1. Diagram shown for Input Control Enable-LOW 
and Input Control Disable-HIGH. 
2. Si, Se and Sg of Load Circuit are closed except 
where shown. 


Enable and Disable Times 


CLK 


INPUTS 


MUST BE 
STEADY 


MAY CHANGE 
FROMH TOL 


MAY CHANGE 
FROM L TOH 


DON'T CARE; 
ANY CHANGE 
PERMITTED 


DOES NOT 
APPLY 


ZO 
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SWITCHING WAVEFORMS 
KEY TO SWITCHING WAVEFORMS 


OUTPUTS 


WILL BE 
STEADY 


WILL BE 
CHANGING 
FROMH TOL 


WILL BE 
CHANGING 
FROML TOH 


CHANGING; 
STATE 


UNKNOWN 


CENTER 
LINE t$ HIGH 
IMPEDANCE 


“OFF” STATE | 





KS000010 


Input Clock Timing 


WF025010 














SWITCHING WAVEFORMS (Cont'd.). 


Rosy 


So.31 





WF025020 


‘Timing of Operations with F Register and Status Clocked. Assumes 32-Bit Bus, Single-Cycle, 
LSW-First Input Mode and Flow-Through Operation 
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CLK 


Ro.31 


So.31 


INSTRUCTION 
LINES 


SWITCHING WAVEFORMS (Cont'd.) 





WF025030 


Timing of Operations with F-Register and Status Register in Feedthrough Mode. Assumes 32- 


Bit Bus, Single-Cycle, LSW-First Input Mode and Flow-Through Operation. 
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SWITCHING WAVEFORMS (Cont'd.) 


RFSELo.9 


WF025040 


Register File Control Timing 








OEF 
HIGH LEVEL / 
Me Seeeleneateds meee ecco s Voy 0.5 V 
Fo.31 v ; OH 
Soe eae HIGH IMPEDANCE i. / 
’ Ja HIGH IMPEDANCES 
Fo.31 





LOW LEVEL 
_ WF025050 


Enabie/Disable Timing for Fo - 31 
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SWITCHING WAVEFORMS (Cont'd.) 


HIGH LEVEL 
FLAGS, SIGN 


FLAGS, SIGN 
LOW LEVEL 


Enable/Disable Timing for FLAG;~6 and SIGN 


. (— 


WF025070 


Output Selection Timing 


MASTER/SLAVE ERROR 


GO 


XX 
—e © 


MSERR / \ 


WF025080 
Master/Slave Timing (Assumes SLAVE Mode) 
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WF025060 
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Advanced Micro Devices is recognized as the pioneer 
and leading supplier of fast microprogrammable bit-slice 
and related integrated circuits used in a wide variety of 
high-performance systems 


Because of their flexibility, these microprogrammable 
ICs require a deeper understanding of hardware than 
required by a typical MOS microprocessor. But there is 
no reason to shy away from microprogramming: it is not 
difficult, and there are several hardware and software 
tools available. 


Tools that help the systems engineer design his system 
can be in the form of hardware, software, written materi- 
als, and even professional advice. The importance of 
support to any design approach, and the relative difficulty 
of microcoded design, require a detailed explanation. 


As more support is provided to the customer, ease-of- 
design improves and time-to-market decreases. The 
design process becomes less tedious, risk is reduced, 
and a lower skill level is required of the designer to 
implement a successful system. In general, the more 
rigid a device family becomes (i.e., fixed architecture/ 
fixed instruction set), the easier it is to support. 


When assessing the support available for a design 
approach, considerations need to be givento the realities 
of the situation. For instance, building blocks offer a 
flexibility in architecture and programming that can only 
be equaled in gate arrays (which can be even more 
versatile). The informed engineer would not ask the 
question, “Can | get compiler support for what | build with 
gate arrays?” The answer would obviously be, “Only if 
you emulated something that was already supported, or 
targeted a compiler to your new creation.” Until tools 
become available that automatically generate compilers, 
it will remain the case that more flexible approaches get 
you Closer to the hardware and away from higher level 
language, and usually result in better performance. 


lt is impossible to even imagine all of the various ways a 
microcoded system might be constructed. Further, since 
the architecture is not fixed, itis not possible to pre-define 
acompiler or assembler forthe system. Ifthe full flexibility 
of the microprogrammed-building-block approach is to 
be maintained, then a penalty must be paid in terms of 
a lack of high-level language support. Fortunately, a 
good meta assembler greatly alleviates the program- 
ming task. Of course, once a system is defined, a 
compiler may be developed, but not cheaply. With these 
tradeoffs now in mind, we can present tools available to 
the Am29300/29C300 family. 


5.1 Am29C300 EVALUATION BOARD 


The Am29C300 Evaluation Board is an educational tool 
to help the user understand the Am29C300 32-bit build- 
ing-block family. With all the major devices of the 
Am29C300 family and an on-board debug monitor, the 
board provides an excellent tool for those who would like 
to learn more about the Am29C300 family. A block 
diagram of the board is shown in Figure 5-1. 


The board consists of two systems: the 80188 and 
Am29C300 system. The 80188 system is a front-end 
processor which provides the necessary interface be- 
tween the board and external sources, such as a CRT 
terminal. Through a parallel interface between the 80188 
system and the Am29C300 system, the 80188 system 
can control and monitor the activity of the Am29C300 
system, which is a 32-bit system with three major parts: 
a computer control unit, an execution unit, and memory. 


Am29C300 System 


As a standard computer architecture, the computer 
control unit provides all the control signals for the 
Am29C300 system. It includes several major hardware 
logics: sequencer (Am29C331), writable control store, 
pipeline register, interrupt controller, and macro instruc- 
tion register. Its operation is a very standard procedure. 
First, it fetches and stores a macro instruction into the 
macro-instruction register; then, the opcode of the macro 
instruction is decoded to find a correct microroutine for 
the macro instruction. Finally, the selected micro-routine 
controls the operations of the execution unit and the 
memory. 


With the building blocks of the Am29C300 family, a 
powerful execution unit has been implemented on the 
board. The execution unit is able to handle 32-bit arith- 
metic and logic operations, multi-precision multiplication 
and division, and single-precision floating-point calcula- 
tions within a reasonable time period. Also, the execution 
unit has 64 32-bit registers in which to store data. The 
following Am29C300 building blocks have been included 
in the execution unit: 


e Am29C334 — 64 x 18 Bit Dual-Access Four-Port 
‘Register File 


* Am29C332 — 32-Bit Arithmetic Logic Unit 
¢ Am29C323 — 32-Bit Parallel Multiplier 


* Am29C325 — Single-Precision Floating-Point 
Processor 
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Register Files (64 * 32 BITS) 
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Figure 5-1. Am29C300 Evaluation Board 
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Am29C323 - Multiplier 


RND 


PSEL1 
PSELO 
ACC1 
ACCO 
XSEL 
TOX 


/ENXA 
/ENXB 
YSEL 
TCY 


FTY 
/ENYA 
/ENYB 
TSEL 
/ENT 
/ENP 
/ENI 


FTP 


X : Dont Care 
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ICIN 
INT-13 
INT_12 
INT_11 
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INTEN 


< 
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Am29114 - Interrupt Controller 


LPRIISSAaNaNaAatATSLRSRRSLSSSLSBLK 
Am29C331 - Sequencer 
Q 


290334 - Register File 
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/PECLR 
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Figure 5-2. Am29C300 EVB Microcode BIT Map 
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Sequencer 
& 


Interrupt Controller 
32 Bi 


| 
ar 
¢2) 


Register A 
12 BITS 


a2 


Register B 
12 BITS 


an 


Execution 
23 BITS 


Control-2 
17 BITS 
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The: memory architecture is very straightforward. It in- 
cludes 12 static RAMs and a control PAL. Three bits of 
the microcode are decoded by the control PAL to gener- 
ate chip selects and write pulses forthe RAMs. A register 
in the execution unit should act as a program counter to 
provide addresses for the RAMs. 


Microcode 


The 96-bit wide microcode is divided into five major 
fields: sequencer and interrupt field, register A field, 
register B field, execution field, and control field. A 
detailed microcode format is shown in Table 5-1. 


Monitor 


The monitor of the Am29C300 evaluation board is imple- 
mented in C and controlled by the 80188 system. It 


provides a limited microcode assembler and disassem-. 


bler, a download and upload utility, and a microcode 


debugger. The debugger includes various useful fea- 
tures such as single step, break point, and display of 
register contents. 


5.2 Am29300 TEST BOARD 


_ With the increasing complexity of integrated circuits, it is 


often necessary to check the functionality of an IC. The 
IBM PC board allows the user to functionally check any 
Am29300 family device by writing input test vectors. The 
software accompanying the board takes these input 
vectors one at a time, applies them to the device under 
test, clocks the device, and produces output vectors. 
Figure 5-3 shows the architecture of the board. As stated 
above, the intention is to allow users to familiarize them- 
selves with the functionality of the part. AC specs cannot 
be verified. Sample input and output files for the 
Am29331 are also shown. 


Table 5-1 

32 Bits 12 Bits 12 Bits 23 Bits 17 Bits 
Sequencer Register A Register B Execution Control 
& Interrupt (Source) (Source & 
Controller Destination) 
Am29C331 Am29C334_A_Port Am29C334_B_ Port Am29C332 Am2925 
Am29114 Am29C323 

Am29C325 












PGA ZF 
U1 
(Vcc, GND Header) 


28536 x 16 | 
PARALLEL I/O 
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Figure 5-3. Am29300 Testboard — Block Diagram 
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Support Tools 
Am29331 Input File 
socket 120 
63,76,120, 96, 95,83, 82 
107,93,79,80, 81,67 
69,94,68,62 
61,55,39,57,56, 43, 44, 45, 37,38,25,26 
65,60,48,28, 64,53, 40,27,58,52, 41,14,59,47,42,1 
108,85,86,100,114, 90,104,105,24,10,8,20,19,30,29,2 
109,98,99,88,115,103,117,106,12,23,22,33,6,18,4,15 
46,77 
97,111,113,101,102, 91, 92,119,13,36,35,21,7,31,17,16 
84,72,78,71 
73,74,89,34,66,75,11,3,118,110 
32,87,54,49,51,50,5,116,9,112,70; 
MMMM 
321 0 
T ay SA tat ee A Y A 
S I Tt -$,.1 3.3.3. 35. 1 1 /-EE 
/ HLNI5 31 -~e- - 5 5 / ) IF RQ 
R OATN- -- 3210- = Cg) NURUV G 

CS FLVETI ST ae - D A IE Y TLOAC N 

PTCDENRO O 0 00000 0 ND 0 ALRLC D 
:specify base for each column 
& BBBBBBB QOH H HHH H H H H HHHH HHHH B B HHHH B B B B QHH OHH 
:specify pin direction for each column 
% IIITIIIIIII III I III IIII IIII I I 0000 0 0 0 0 000 000 
:RESET 
001 w 0 X O X X X XX X XXK X X X OX XXXX XXXX X DN OOOO O O O O OOO ODO OO1l A 
:CONTINUE, BRCC_D, CONTINUE 
002 w100010 30 X XXX X X X X XXXX XXXX 0 0 0000 0 0 0 0 000 OOO OO1l A 
003 w10001 0 00 0 001 X X X X 8971 XXXX 0 0 0000 0 0 0 O 000 O00 OO1l A 
004 wi100010 30 X XXX X X X X XXXX XXXX 0 0 0000 O 0 O O 000 000 O04 A 
005 wil1000t10 30 X XXX X X X XK XXXX XXXX 0 0 OOOO 0 0 0 O ODD ODO 003 L 
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Am29331 Output File 


socket 120 

63, 76,120, 96, 95, 83, 82 

107,93,79,80, 81,67 

69,94, 68,62 

61,55,39,57,56,43, 44, 45,37,38,25,26 

65,60, 48,28, 64,53, 40,27, 58,52, 41,14,59,47,42,1 
108,85, 86,100,114, 90,104,105,24,10,8,20,19,30,29,2 
109, 98,99, 88,115,103,117,106,12,23,22,33,6,18,4,15 
46,77 
97,111,113,101,102,91,92,119,13,36,35,21,7,31,17,1 
84,72,78,71 
73,74,89,34,66,75,11,3,118,110 
32,87,54,49,51,50,5,116,9,112,70; 


MMMM 
3210 
T aS 33 D A = A 
Se LS. 3 33. 3° 4 1 1 /-EE 
/ HLNI5 31 Sy ey Set es 5 / 5 IF RQ 
R OAT N= =< S32 LOS i CO = NURUV G 
CSFLVETI ST « ec ee aD A IE Y TLOAC § N 
PTCDENRO O 0 00000 0 ND 0 ALRLBLC D 


:specify base for each column 

& BBBBBBB QH H HHH H H H H HHHH HHHH B B HHHH B B B B QHH OHH 
:specify pin direction for each column 

% IIIIIIIIII III IIITI IIII IIII I I 0000 0 0 O O 000 000 
:RESET 

001 w0 0000 0 00 0 000 0 0 0 0 0000 0000 0 0 0000 10 0 0 3FF 000 - 001 


:CONTINUE, BRCC_D, CONTINUE 


002 


wl1000i10 30 0 000 0 0 0 0 0000 0000 0 0 0001 10 0 0 3FF 000 - 001 
003 w10001 0 00 0 001 0.0 0 0 8971 0000 0 0 8971 10 0 0 3FF 000 - O01 
004wi100010 30 0 000 0 00 0 0000 0000 0 0 8972 10 0 0 3FF 000 - 001 
004 wil10001 0 30 0 000 0 0 0 0 0000 0000 0 0 8973 10 0 0 3FF 000 - 002 
004 wil10001 0 30 0 000 0 00 0 0000 0000 0 0 8974 10 0 0 3FF 000 - 003 
004 wi10001 0 30 0 000 0 0 0 0 0000 0000 0 0 8975 10 0 0 3FF 000 — 004 
005 w100010 30 0 000 0 0 0 0 0000 0000 0 0 8978 10 0 0 3FF 000 - 003 





5-6 | 


CHAPTER 5 





5.3 Am29300 DEFINITION FILE 


introduction 


The definition file contains the description of the micro 
machine for which assemblies are to be performed. Its 
innate flexibility allows the assembler to be retargetted to 
support any given bit-slice microprocessor machine and 
instruction format. The definition file is composed of: 


¢ Instruction Definition 


¢ Macro Definitions 


The definition file is stored on a floppy disk and can be 
requested from your local AMD sales office. | 


Instruction Definition 


The instruction definition defines a name for the instruc- 
tion, the length of the instruction, the fields of the instruc- 
tion and variation in format, allowable values for each 
field, and default values for each field. 


The instruction definition contains: 
¢ Field Definitions 


¢ Case Definitions 


Field Definition 


A field in a microinstruction is a group of bits that are 
logically related and are manipulated as a unit. The form 
of the field definition is: 


<fielddef 1> <descript 1> 


<descript 2> (<const 1> : <id 1>, 
<const 2> : <id 2>, 
<const m> : <id m>) 


<fielddef i> is a name of a field definition to be defined. 
<const i> is an integer-valued expression of an identifier. 
<id i> defines a name of an identifier. A descriptor 
<descript> specifies the size and location of the field and 
assigns valid values forthe field. Valid descriptors are as 
follows: 


Bits: Bits that make up a field 
Length: Length of a field 
Default: Default values for a field 


Values: Definitions of names for field values 


Support Tools 

Invert: One’s complement field values 
Complement: Two’s complement field values 
Mask: Use low bits of value, ignore high 

order bits 
Reverse: Reverse order of bits in field 
Valid: A list of valid values for the field 
Display: Display mode for debugging 


The following is an example of the field definition for the 
Am29332: 


Am29332:length (7) 
values (H’00': ZERO-EXTA 
H’01': ZERO-EXTB 


H’5F’: SMULFIRST) 


The name of the field may be any sequence of charac- 
ters. Constants may be specified in hexidecimal, deci- 
mal, octal, binary, or ASCII characters. Each of the 
‘values’ definitions consists of a constant followed by a 
colon and a symbol that will represent the constant’s 
value when assigned to the field. 


Case Definitions 


The case definition is used to describe multiple formats 
for the microinstruction word. A microinstruction may 
have different interpretations of certain fields, depending 
upon other fields. The case definition provides a way of 
making this form of differentiation formal. The specifica- 
tion is such that if the selector field has a specific value, 
only one of the alternate field definitions is valid and all 
the others are undefined. 


The case statement is introduced by ‘case’ and followed 
by an optional field selector field name. Following this are 
one or more case entries. Acase entry consists of a value 
or list of values of the selector field and a ‘begin-end’ 
block containing the description of the fields that are 
defined for this value. 


The form of a case definition is as follows: 


Case {<selector>} of 
<casevaluel> :begin 
<fielddescrs> 
end; 
<casevalue2> :begin 
<fielddescrs> 
end; 
endcase; 


<selector> is an optional field that is set depending upon 
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which case branch is selected. <casevalue1> is a value 
of the selector that selects the branch and is used 
for verification. <fielddescrs> is a field definition. An 
example is: 


sel : length (1); 
case sel of 
Q : begin 
addr :length (8); 
Cntr length (8); 
end; 
1 : begin 
data length (8); 
end 
endcase; 


This structure corresponds to the following overlayed 
microconstruction: 


O98 765432 1 


65 4 3 2 


0 (bit position) 





Macrodefinitions 


Macrodefinition is a very simple language, consisting of 


the field assignment. It is based upon the instruction defi- 
nitions discussed above and is user-definable, depend- 
ing upon any particular architecture. 


Allinstructions are a sequence of phrases, each of which 
is either a field assignment or a macro call. The following 
is the form of macrodefinitions: 


macro <op> &<var 1> &<var 2> 
begin 
<fielddef 1>=<id k>, 
=&<var j> 
endm; 


,<fielddef i> 


<op> is aname of the macro. &«var j> is amacro variable 
that may be local to a particular macro or accessible by 
any other macro that defines the same global macro 
name. The following is an example for the Am29331: 


macro call &dest; 
begin 
data=&dest, 
endm; 


Am2 9331=CALL 


In this case, the Am29331 is set for a subroutine call 
instruction CALL and the microprogram branches to the 


address specified by «dest. Other conditions are default 


as given by the Am29331 instruction definition. 


; AMDASM definitions for Am29114 Real Time Interrupt Controller 


WORD 4 
MCLR: EQU H#0 
CHSR: EQU H#1 
CCIR: EQU H#2 
NOOP: EQU H#3 
BSMK: EQU H#4 
BCMK : EQU H#5 
LDMK: EQU H#6 
RDMK : EQU H#T 
BSSR: EQU H#8 
BCSR: EQU H#9 
LDSR: EQU H#A 
RDSR: EQU H#B 
BSIR: EQU H#C 
BCIR: EQU H#D 
LDIR: EQU H#E 
RDIR: EQU H#F 
DEF 4VH#3 


INT sCNTh: 


; Master clear 

; Clear highest in service reg 

; Clear highest in interrupt reg 
; No operation 

; Set mask reg from D-Bus 

; Clear mask reg from D-Bus 

; Load mask reg from D-Bus 

; Read mask reg to D-Bus 

; Set in service reg from D-Bus 
; Clear in service reg fr D-Bus 
; Load in service reg from D-Bus 
; Read in service reg to D-Bus 

; Set interrupt reg from D-Bus 

; Clear interrupt reg from D-Bus 
; Load interrupt reg from D-Bus 
; Read interrupt reg to D-Bus 


; Default to no operation 





5-8 


CHAPTER 5 
Support Tools 





; AMDASM definitions for Am29331 Microprogram Sequencer 


WORD 14 


; Am29331 bit fields: 


; FC values: 


FCONT: 


; CIN values: 


CINCR: 
CNINCR: 


; Condition control (COND) 


TRUE: 
FALSE: 
ALWAYS: 


; Address source 


D.BUS: 
A.BUS: 
MULTW: 
STACK: 


; Sequencer operation 


BRA: 

CALL: 
EXIT: 
DUMP: 


0 — FC-— Force continue 
et — CIN — Increment carry in 
2-7 — IOQ-I5 — Instruction 
8 — INTEN — Interrupt enable 
9: :. — OE — D-Bus Output enable 
10-13 — S0-S3 — Test select 
EQU B#1 ; Force continue 
EQU B#0 ; Increment by one 
EQU BF1 ; Don’t increment 
(14-15) 
EQU B#00 ; Branch on true 
EQU B#01 ; Branch on false 
EQU B#10 ; Branch always 
(ADDR) (I2-I3) 
EQU B#00 ; Address source - D-Bus 
EQU B#01 +; Address source - A-Bus 
EQU B#10 ; Address source - Multiway 
EQU B¥11 ; Address source - Stack 
(SEQ) (10-11) 
EQU H#00 ; Branch 
EQU H#01 +. Cal 
EQU H#10 ; Exrit 
BOU H#11 ; Decrement counter and jump 
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EERE TE LE EY TI SE I I TT A PLE EY IS TE TS RE PT EI I PE TS SE I a TET TIE TTS SE aE EID, 


yj Sequencer special instructions (I0-I5) 


CONT: EQU 6H#30: . ¢ Continue 

FOR.D: EQU 6H#31: ; For D 

DECR: EQU 6H#32: ; Decrement counter 
LOOP : EQU 6H#33: ; Loop 

POP .D:- | EQU 6H#34: ; Pop stack to D 
PUSH.D:. EQU 6H#35: ; Push D on stack 
RESET.SP: EQU 6H#36: ; Reset stack pointer 
FOR.A: EQU 6H#37: e FOr: A ks ; 

POP.C: EQU 6H#38: ; Pop stack to Counter 
PUSH.C: EQU 6H#39: ; Push Counter to stack 
SWAP : EQU 6H#3A: ; Exchange Ctr and TOS 
STACK.C: EQU 6H#3B: ; Push Ctr & Load Ctr. D 
LOAD .D: EQU 6H#3C: ; Load Ctr from D | 
LOAD .A: EQU 6H#3D: ; Load Ctr from A 

BSET: EQU 6H#3E: ; Load Comp Reg from D 
CLEAR: | EQU 6H#3F: ; Disable Comparator 


; Test conditions (S0-S3) 


TO: EQU © H#0 ; Test TO 

Ths EQU H#1 ; Test Tl 

[2s EQU H#2 ; Test T2 

T3: EQU H#3 ; Test T3 

T4; EQU H#4 ; Test T4 

Loe EQU H#5 ; Test T5 

T6: EQU H#6 ; Test T6 

TT): EQU H#7 ; Test T7 

T8: EQU H#8 ; Test T8 == 

CARRY: EQU H#8 ; Carry 

TQ: EQU H#9 ; Test T9 == 

SIGN: EQU H#9 ; Negative sign 
T10: EQU H#10 ; Test T10 == 

OVER: EQU H#10 ; Overflow 

T11: : EQU H#11 ; Test T11 == 

ZERO: EQU H#11 ; Zero or Equal 
ULTB: EQU H#12 ; C+Z Uns LT, borrow 
ULT: EQU H#13 ; ~C+Z Uns LT 

LT: EQU H#14 ; N * Vv — Signed LT 
LE: EQU H#15 ; (N * V) + Z2-— LE 
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; Definitions for 


; (interrupts 
SEQ: 

; (interrupts 
SEQTI: 


g 


; Definitions for 


; (interrupts 
SSEQ: 

; (interrupts 
SSEQI: 

END 


conditional sequencer operations 


disabled) 

DEF B#0,B#1, 2VB#11, 2VB#00, 2VB#00, BO, BE1, 4VH#0 
FC CIN COND ADDR SEQ INTEN DOE TEST 

enabled) 

DEF B40, B¥1, 2VB#11, 2VB#00, 2VB#00, B#1, BH1, 4VH#0 


FC CIN COND ADDR SEQ INTEN DOE TEST 


special sequencer operations 


disabled) 

DEF B#0, B¥1, 6VH#30:,B#0,B#1, 4VH#0 
FC CIN I0-I5 INTEN DOE TEST 

enabled) 

DEF B#0,B#1, 6VH#30:,B#1,BH1, 4VH#0 
FC CIN I0-I5 INTEN DOE TEST 
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[III III CII III ICI III III IIIA AKT T AIK KICK 
{ 4} 
{ MCASM (Microtec Assembler) . | } 
{ Definitions for Am29323 32-bit Parallel Multiplier } 
{ } 


[ERK KKKKKK KK KKK KKK KKK KK RRR RRR KK KKK KK RAK RIK KKK KK KKK KKK KKK KKK KKK KAKA KK KK KKK } 


rnd: length (1), { Round control 3 
values (0 : inactive, 1 : active), 
default (inactive) ; 


format: length (1), { Format adjust } 
values (0 : fractional, 1 : signed), 
default (signed) ; 
psel: length (2), : { Output control } 
values (0 : temp, { Temp reg } 
1 : low, { Lower half } 
2 : high, { Upper half } 
3 : none), { No output } 
default (none); 
acc: length (2), { Accumulator control } 
values (0 : pass, 
1 : accum, 
3: shift), 
default (pass); 
xsel: length (1), { Select X register } 
values (0 : XB, 1: XA), 
default (XA); | 
tex: length (1), { X mode control } 


values (0 : unsigned, 1 : signed), 
default (signed) ; 


“ELMS length (1), { Feedthru control for X regs} 
values (0 : registered, 1 : transparent), 
Gefault (registered); 


enx: length (2), { Load XA and XB regs } 
values (0 : both, 
1 : XA, 
2 : XB, 
3 : none), 
default (none) ; 


ysel: length (1), { Select Y register } 
values (0 : YB, 1: YA), 
default (YA); 


icy: length (1), { Y mode control } 


values (0 : unsigned, 1 : signed), 
default (signed); 
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fty: length (1), { Feedthru control for Y regs} 
values (0 : registered, 1 : transparent), 
default (registered) ; 
eny: length (2), { Load YA and YB regs } 
values (0 : both, 
d- 2 UY Ry 
2? Be 
3 : none), 
default (none); 
tsel: length (1), { Temporary reg load select } 
values (0 : low, { Lower half } 
1 : high), { Upper half } 
default (low); 
ent: length (1), { Load temporary reg } 
values (0 : load, 1 : hold), 
default (hold); 
eni: length (1), { Load instruction reg } 
values (0 : load, 1 : hold), 
default (hold); 
enp: length (1), { Load accumulator} 
values (0 : load, 1: hold), 
default (hold); 
Pt. length (1), { Feedthru control for inst reg } 
values (0 : registered, 1 : transparent), 
default (registered) ; 
ftp: length (1),. { Feedthru control for accum } 


values (0 : registered, 1 : transparent), 
default (registered) ; 
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{ RAKKKKKKAKAKKEKKKKK KKK KKK KKKEK KERR KKK KK KKK KEK RR KKKKKKKKKKREKKEKKKKKK KKK } 
{ | | | } 
{ MCASM (Microtec Assembly) | | } 
{ Macros for Am29323 32-bit Parallel Multiplier } 
{ } 


{ KKKKKKKKKKEKKKKKKKK KKK KKKKKKKKKKKKKKKKKKEKKKKKKKEKRKKKKKKKKKKEKRKEKRKEKKKKKKKKKK | 


(OIC II ICICI ICICI ICI IO IO IOI IO TO IOI TOO ITO ITOK IR KICK) 
{ | } 
{ Load X Register } 
{ } 
[RIA I IIR IO IO III IO IOI ITI TOR IORI TO IR IT II TOR IKI TOK I TOK IK AK KICK IK } 
macro loadx &X &mode; 

begin 

output (“enx = &X, tcx = &mode”); 

end 


{ RKKKKKKEKKKKKKKKKKKKKKKKKKKKKKEKKKKK KKK KKK KKKEKKKKKKKKKKKEKKEKKEKKKKKKKKKKKK } 
{ } 
{ . Load Y Register } 
{ : } 
{ RAK AKAKKHEKKKK KKK KK KKK KEK KKK KKK KK KKK KKK KKK KKK KKK KKKKKKEKREKK KKK KKKKKKKKKKKKE | 
macro loadyYy  &Y &mode; 

begin 

output (“eny = &Y, tcy = &mode”); 

end 


{KR KKKKKKKKKKKKKKKK KKK KK KKK KKK KKK KK KK KKK KKK KKKKKEKKKKKEKKKKKKKKKKKKKKK KKK | 
{ oo | } 
{ Load Temp Register } 
{ | ‘, 
{ RKKKKKKKKKKKKKKKKEKKKKEKKKKKKKKKKKKK KKK KKK KRKKKKKKKKKKKKKEKKKKKKKKKKKKKKKK | 
Macro > loadT &mode; 

begin 

output (“ent = load, tsel = &mode”); 

end | 


{KKK KKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KK KKK KK KK KKKKKKKKEKKKKKKKKKKKKKK | 
{ | } 
{ Select X & Y registers } 
{ | } 


{ KEKKKKKKKKKKKKKKKKKEK KKK KKK KKKKKKKEKKKRKEK KKK KKK KKK KEK KEKKKEKKKKKKKKKKKEKKK KK } 
macro selXY &X &Y; 

begin 

output (“xsel = &X, ysel = &Y”); 

end 


[RAKKKKKKKKKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KK KKKKKKKKKKKKEKEKKEKKEKKE | 


{ , } 


{ Multiplier function } 


{ } 


{AK KKK KKK KK KKK AK KK KK IKK KK RIK KK RK RK KK KKK KKK KKK KKK KK KK EK KK KKKK KK KKK | 


macro mul &A &mode; 
begin 
output (“acc = &A, enp = load, psel = &mode, eni = load”); 
end 
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{ KKK AKAKKK KKK KK KK KKK KK KKK KKK KKK KKKKKKKKKKKK KKK KKKKKKKKKKKKKKKKKKKKKK KKK |} 


{ 
{ MCASM (Microtech Assembler) 


{ Definitions for Am29332 32-bit Arithmetic Logic Unit 


} 
} 
} 
} 


{ RAKKKKKKKKKKKKKKKKKK KKK KK KEKKK KKK KKKKKKKKKKKKEKKKKKKKKKKKKKKKKKKAKKKKKK KK |} 


position: length (6), 
default (0); 


width: length (5), 
default (31); 


case of 
0 : begin 
b width:length (2), 
values (0 : 


WNHODNDF FO 


{ Width of field 


four, 
long, 
one, 


: byte, 


two, 
short, 
three), 


default (four); 


Am29332:length (7), 
values (H’00': 
AOL's 
H’02': 
Hr 03." 
H’'04': 
HOS: 
H’06': 
H’OQ7': 
H’08': 
H’'09': 
H’ OA’: 
H’ 0B’: 
H’'0C’: 
H’QD’: 
H’ OF’: 
H’ OF’: 
H’10'°: 
H’11': 
H’ 12": 
BY 13": 
H’'14': 
1s bape ie ee 
H’'16°: 
H’1i7':; 
H’18': 
H'19': 
H’1A’: 
H’1B’: 
H? DCr s 
He iD" 4 
How 
Hires 
H’20': 
Hea 
H’22 5:3 


{ LSB Position or shift count 


{ Byte width of data 


{ Instruction } 


ZERO-EXTA, 
ZERO-EXTB, 
SIGN-EXTA, 
SIGN-EXTB, 
PASS-STAT, 
PASS-Q, 
LOADQ-A, 
LOADQ-B, 
NOT-A, 
NOT-B, 
NEG-A, 
NEG-B, 
PRIOR-A, 
PRIOR-B, 
MERGEA-B, 
MERGEB-A, 
DECR-A, 
DECR-B, 
INCR-A, 
INCR-B, 
DECR2~A, 
DECR2~B, 
INCR2-A, 
INCR2~-B, 
DECR4-A, 
DECR4-B, 
INCR4-A, 
INCR4~B, 
LDSTAT-A, 
LDSTAT-B, 
undefinedl, 
undefined2, 
DN1-0F-A, 
DN1-OF-B, 
DN1-0OF-AQ, 


{ 


{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 


zero extend A 
zero extend B 
Sign extend A 
Sign extend B 
Pass status to Y 
Pass Q reg to Y 
Load A into Q 
Load B into Q 

Not A 

Not B 

2's complement A 
2’s complement B 
Output priority A 
Output priority B 
Merge A with B 
Merge B with A 

- il 


oe el 


Losec a 


PPP PHONY DN DF EF 


+ 
+ 
Load A into status 
Load B into status 
RESERVED 

RESERVED 

A >> 1, zero fill 
B >> 1, zero fill 
AQ >> 1, zero fill 


ODrnWrunprunrwr wp 


} 


} 
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H’23': DN1-OF-BQ, BQ >> 1, zero fill } 
H’24': DN1-1F-A, A >> 1, one fill } 
H’25': DN1-1F-B, B >> 1, one fill } 
H’26': DN1-1F-AQ, AQ >> 1, one fill } 
H’27': DN1-1F-BQ, BQ >> 1, one fill } 
H’28': DNI-LF-A, A >> 1, link fill } 
H'29': DN1-LF-B, ‘B >> 1, link fill } 
H’ 2A’: DN1-LF-AQ, AQ >> 1, linkfill } 
H’ 2B’: DN1-LF-BQ, BQ >> 1, linkfill } 
H’2C’: DN1-AR-A, A >> 1, sign fill } 
H’2D’: DN1-AR-B, B: >> - ly Sign fill. 4} 
H’ 2E’: DN1-AR-AQ, AQ >> 1, sign fill } 
H’2F’: DN1-AR-BOQ, BQ >> 1, sign fill } 
H’30': UP1-OF-A, A << 1, zero fill } 
H’31': UP1-OF-B, B << 1, zero fill } 
H’32': UP1-0OF-AQ, AQ << 1, zero fill } 
H’33': UP1-0F-BQ, BQ << 1, zero fill } 
H’'34': UP1-1F-A, A << 1, one fill } 
H’35': UP1-1F-B, B << 1, one fill } 
H’36': UP1-1F-AQ, AQ << 1, one fill } 
H’37': UP1-1F-BQ, BQ << 1, one fill } 
H’38': UP1-LF-A, K-<2 17 Link £112 “4 
H’39': UP1-LEF-B, B << 1, link fill } 
H’ 3A’: UP1-LF-AQ, AQ << 1, link fill } 
H’ 3B’: UP1-LF-BQ, BQ << 1, link fill } 
H’ 3C’: ZERO, zeros to Y } 
H’ 3D’: SIGN, -~l to Yif N == 1 } 
} 
} 
} 


H’3E’: OR, A or B 

H’3F': XOR, A exclusive or B 
H’40': AND, A and B 

H’41': XNOR, A exclusive nor B } 
H’42': ADD, A+B } 
H’ 43": ADDC, A + B + carry } 
H’44': SUB, A-B } 
H’45': SUBR, B- A } 
H’46': SUBC, A - B - carry } 
H’ 47": SUBRC, B- A - carry } 


H’ 48': SUM-CORR-A, 
H’ 49": SUM-CORR-B, 
H’ 4A’: DIFF-CORR-A 
H’ 4B’: DIFF-CORR-B, 
H’ 4B’: SDIVFIRST, 
H’ 4F’:; UDIVFIRST, 
H'50': SDIVSTEP, 
H’'51': SDIVLASTI, 
H’52': MPDIVSTEPI1, 
H’53": MPSDIVSTEP3, 
H’54': UDIVSTEP, 
H’55': UDIVLAST, 
H’56': MPDIVSTEP2, 
H’57': MPUDIVSTEP3, 
H’58': REMCORR, 
H’59': QUOCORR, 
H’5A’: SDIVLASTZ2, 
H’ 5B’: UMULFIRST, 
H'5C’: UMULSTEP, 
H’ 5D": UMULLAST, 
H’ 5E’: SMULSTEP, 
H’5F’: SMULFIRST), 
default (ADD); 


Correct BCD A for add 
Correct BCD B for add 
Correct BCD A for sub 
Correct. BCD B for sub 
First step signed 
First step unsigned 
Iter step signed 

Last step signed / + 
First step multi / 
Last step multi signed 
Iter step unsigned / 
Last step unsigned / 
Iter step multi / 
Last step multi uns 
Correct rem after / 
Correct quo after / 
Last step signed / - 
First step unsigned * 
Iter step unsigned * 
Last step unsigned * 
Iter step signed * 
First step signed * 


am te py pate pee pote pte pete pete pete pte ee ti te ete pte te ete ny 
Pm te tpt pte pt pti ci mt i ei i i ei i i i ti pi tn i nite te te ei et cee pt te ety SON AN eN — 


Ree eee eee eee eee eee’ 


end; 
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1 : begin 
pos _sre:length (1), 
values (0 


default (pins); 


wid src:length (1), 
values (0 


default (pins) ; 


Am29332:length (7), 


values (H’60': 
HY outs 
H* 62°: 
H’ 63" 
H’ 64°: 
H65% 
H’66': 
H’ 67': 
H’68': 
H’69': 
H’ 6A’: 
H’ 6B’: 
H’ 6C’: 
H’ 6D’ : 
H’ 6B’: 
H’ 6F’;: 
A703 
1 leo Lee 
HY 72": 
1 hae as eae 
H’74':; 
HY yo" 
H’76': 
H’77°: 
AH’ 79% 
H’-/ 8): 
H’ 7A’: 
H’ 7B’ : 
Hr Jers 
H’ 7D’: 
H’7E’: 
H’ 7EF! : 


end; 
endacase; 


borrow: length (1), 
default (0); 


hold: length (1), 
default (0); 


EXTBIT-STAT, 


{ Source for position } 


: *pans;. 1 +. reg), 


{ Source for width } 


2 pins, lL reg); 


Extract STAT<pos> 


} 
} 
} 
} 
} 
} 
} 
} 


; 
} 
} 


{ Instruction } 
NB-SN-SHA, { A << pos, sign fill } 
NB-SN-SHB, { B << pos, sign fill 
NB-OF-SHA, { A << pos, zero fill 
NB-OF-SHB, { B << pos, zero fill 
NBROT-A, { Rotate A up pos, bits 
NBROT-B, { Rotate B up pos bits 
EXTBIT-A, { Extract A<pos> } 
EXTBIT-B, { Extract B<pos> } 
SETBIT-A, { A<pos> = 1 } 
SETBIT-B, { B<pos> = 1 } 
RSTBIT-A, { A<pos> = 0 } 
RSTBIT-B, { B<pos> = 0 } 
SETBIT-STAT, { STAT<pos> = l } 
RSTBIT-STAT, { STAT<pos> = 0 } 
NOTF-AL-B, { Comp B field } 
PASSF-AL-B, { Pass B, set Z flag 
NOTEF-A, { Comp A field, unalgnd 
NOTF-AL-A, { Comp A field, aligned 
PASSF-A, { Pass A field, unalgnd 
PASSF-AL-A, { Pass A field, aligned 
ORF-A, { A or B, unaligned 
ORF-AL-A, { A or B, aligned field 
XORF-A, { A xor B, unaligned 
XORF-AL-A, { A xor B, aligned field 
ANDF-AL-A, { A and B, aligned field 
ANDF-A, { A and B, unaligned 
EXTF-A, { Extract field in A 
EXTF-B, { Extract field in B 
EXTE-AB, { Extract field in AB 
EXTF-BA, { Extract field in BA 

{ 
{ 


PASS—-MASK) ; 


Generate mask pattern 


{ Borrow mode } 


{ Hold status & Q} 


} 
} 
} 
} 
} 
} 
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[ORI II IIT IO IO IIR ITO KI II TOR I TTR IT RII ITOK IT OR IT ROKK ITOK IA AK AIK} 
{ : | } 
{ Macros for MCASM (Microtec Assembler) } 
{ Macros for Am29332 32-bit ALU } 
{ } 


{ KK RKKKKKKKKKKKKKKKKKKKKKKK KK KKK KK KK KKK KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK | 


{ KE KKKKKKKKKKKKKKK KK KKK KKK KKK KKK KK KK RK KKK KKK KKEKKKKKEKKKKKKKKKKKKKKKKKKKKE | 
{ } 
{ datasize — set data size for subsequent operations } 


{ } 


(RII II I III IO TOI IO IO ITO ITO ITO I TO ITO I TOK IOI TOI A TOR IT IK AIK IK KAY 
macro datasize &sZ; 

global é&dsize; 

begin 

&dsize = &8z; 

end 


{ KKKKKKEKKKKKKKEAKKKKKKKEKKKKKKKKKKKKKKK KKK KKK KKKEKEKEKKKKKKKEKEKKKKKKEKKKKKKKKKK | 
{ } 
{ ALU — set alu operation with fixed data size } 


{ } 


{ RAKKKKKKAKKKKKKKKKKK KKK KKK KKKKK KKK KKK KEKE KR KKK KKK KKKKKKKKKKKKKKKKKKKKKK | 
macro ALU & OP ;. 


global &dsize; 

begin 

output (“b width = &dsize, Am29332 = &op”); 
end 


[OCI II II IG ICI III OIGI ICICI GIGI ICICI CII OIC III I KCI Ak} 
{ } 
{ preg — set position source to register } 


{ | } 


{KAKKKKKKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK ERK KK KKK KKKKKKKKKKEKKKKKKKKKKKK | 
macro preg ; 

begin 

output (“pos sre = reg”); 

end 


{KA KKKK KKK KKK KKK KR K KK KK KKK KKK KK KKK KK KKK KKK KKK KKK KKK KK KKKKKKKKKKKKKKK KKK | 
{ } 
{ wreg — set width source to register } 
{ } 


[ROR RRR IK KR IO II III OO III III OOO IIIT TOO KKK RIA ARIK} 
macro wreg ; 

begin 

output (“wid sre = reg”); 

end 


[KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKKKKKKKKKKK KKK KK } 
{ 4} 
{ ALUv — set alu operation for variable data size  } 


{ | | } 


{RAK KKK KKK KK KK KKK KKK KKK KEK KKK KKK KK KKK KEK KKK KKK KKK KKKKKKKKKKKKEKKKKKKKEKKE } 


macro ALUv &op &pos &width ; 
begin 
output (“position = &pos, width = &width, Am29332 = &op”); 
end 
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[RKKKKKRKKKKKKKEKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKK KKK KKK KK KKKEKKKKKKEKKE / 
/* af 
/* MetaStep (Step Assembler) x / 
/* Definitions for Am29325 32-bit Floating Point Processor x / 
/* x / 


[RRKKKKKRKEKKKKKKKKKEKKEKKEKKKKKKRKKKKKKKK KK KKK KK KKK KKK KKK KK KKK KKK KAKA K KKK KKK KK KK / 


enr: length (1), /* Load Register A x / 
values (0 : LOAD , 1 : NOP), 
default (NOP); 


ens: length (1), /* Load Register § * / 
values (0 : LOAD , 1 : NOP), 
default (NOP); 


enf: length (1), /* Load Register F x/ 
values (0 : LOAD , 1 : NOP), 
default (NOP); 


R Select: length (1), /* R Source Select aif 
values (0 : BUS , 1 : F-Reg), 
default (BUS); 


S Select: length (1), /* § Source Select aa 
values (0 : S-Reg , 1 : F-Reg), 
default (S-Reg) ; 


Am29325: length (3), /* FPU Instruction x / 
values (0 : PLUS, /* F=R+4+S x / | 
1: MINUS, /* F=R-S */ 
2 : MUL, /* F=R*S x] 
3 : 2MINUS, /* F = 2 - S xf 
4 FLOAT, /* F = float R= */ 
5> 2° INT, /* F = int R x / 
6 : DEC, /* F = dec R x / 
7 : IEEE, /* F = ieee R xf 
default (0); 
round: length (2), /* Rounding Mode kf 
values (0 : NEAREST, 
1 : DOWN, 
2 : UP, 
3 : ZERO, 


default (NEAREST) ; 
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[FORO IOI ITO ICO III TOI III IOI IOI ICICI TOI III IO III ITO I IOC I TO ITO RIOR A | 
/k x / 

/* Macros for MetaStep (Step Assembler) * / 

/* Macros for Am29325 32-bit Floating Point Processor * / 

/* */ 


[RRR KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKKKKEKKKKEK KKK KKK KKK KK / 


[ RRKRK KKK KK KKK KKK KK KKK KKK KKK KKK KK KK KKK KK KKK KK KKK KKK KEK KKK RKKKKKKKK AK KKK KR / 


/* * / 
/* Load R Register | */ 
* * 


[RKKKKKKKKKK KKK KKK KKK KKK KEK KEK KIRK KKK KKK KKK KKK KKK KEK KKK KKK KEKKKKK KKK KK AK K / 
macro loadr &src; 
begin 
R_select = &src, enr = LOAD 
endm; 


[ BRKKKKKKKKKKKKKK KKK KKK KAKA KEKEKKK KKK KKK KK KKK KKK KKK KKK KKKKKKKKKKKKKKKKKKK KK / 


/k */ 
/* Load S Register By, 
/* | | . * / 


[RRR RK IK KK KK KK KKK KK IK KKK KKK KKK KKK KKK KKK KR KKK IKK KK KKK KKK KKK KKK KKK KKK KKK KKK / 
macro - loads ; | 
begin 
ens = LOAD 
endm; 


[KKK KK KK KKK KKK KKK KER KKK KKK KK KKK KKK KKK KEK KKK KKK KK KKEKKKKKKKKKKK KKK KKK KKK KK / 


/* x / 
/* Load F Register : a] 
es | ad 


[RRR RK RK KI K KR RK KKK KKK RK KK KK KKK IK KKK KKK IK KAKA KER KAR KIRK IK KR KK RK KKK / 
macro loadf ; 
begin 
enf = LOAD 
endm; 


[KKK KKK KEK KKK KKK KKK KKK KKK KEK KKK KKK KEK KKK KKK KKK KKK KEK KKK KEK KEK KKK KK KEK / 


/* | x / 
/* Do all 1 operand FPU operations */ 
/® x / 


[RK RK IKI IK RIK RK KKK IK TOR TOR OR OK ROR TOR TOK IORI TOK OI IOI IK I ICI KI KK IK AK AK / 
macro fpu &op &8 ; 


begin 
Am29325 = &op, S_select = &s 
endm; 
[RRR KK RAK KKK KKK KKK KKK KR KKK KKK KKK KKK KEK KKK RAK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK / 
a x / 
/* Do all 0 operand FPU operations x / 
us */ 
[RR KR KKK KR KKK KK KKK KKK AK KKK KKK KR KK KKK KR KKK KKK KKK KKK KR KKK RK KK KK KK KKK KK RK / 
macro fevrt -&op ; | 
begin 
Am29325 = &op 
endm; 


Er ER TEE EL I SE ES EEE IL NT SOE 
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[KR IKK I IK I IK IK IKK IKK IK I KK I KK IK KKK IK KK KR RK KI KK IKK KK RIK RIK KKK RII K RIK IK IK / 
/* x / 
/* MetaStep (Step Assembler) */ 
/* Definitions for Am29334 Four-Port Register File */ 
fe ~/ 
[ RRKKKKKKKKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KK KKK KKKKKKKKKKKKKKKKKKKKKKK KKK KK / 
Wrt_enable A: length (4), /* Write enable for port A */ 
values (H’0' : double, 


H’8' : 3byte, 
H’3' : high-word, 
H’C’ : low-word, 
H’'7' : byte3, 
H'B’ : byte2, 
H’D’ : bytel, 
H'E’ : byted, 
H'F’ : none), 
default (none); 


OFA: length (1), /* Port A output enable */ 
values (0 : enable, 
1 : disable), 
default (disable); 


A-write: length (6); /* A write address x / 

A-read: length (6); /* A read address x / 

Wrt_enable B: length (4), /* Write enable for port B_ */ 
values (H’0' : double, 


H’8' : 3byte, 
H’3' : high-word, 
H’C’ : low-word, 
AY 7%. Dytes, 
H’B’ : byte2, 
H’'D’ : bytel, 
H’'E’ : byteod, 
H’F’ : none), 
default (none); 


OEB: length (1), /* Port B output enable x / 
values (0 : enable, 
1 : disable), 
default (disable); 
B-write: length (6); /* B write address x/ 


B-read: length (6); /* B read address x / 
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[RRR KKKK KKK KK KKK KKK KK RK KKK KKK KKK KKK KEK KEK KKK ERK KKK KK KEK KKK KKEKKKKKKKKKKKEK / 
fk */ 
/* MACROS for MetaStep (Step Assembler) x / 
/* Macros for Am29334 Four-Port Register | x / 
/* ei] 


[RR KIA II KI KKK KK KIKI KKK IKK KKK IK KIKI IKK IKK RIK KKK KER IK KAI KK EKA EA KK IK KK / 


[ RRKEKKKRKKKEK KK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK RK RK KEK KKK KKKKEK KKK KA / 


/* | x / 
/* SrcA — select A register source : * / 
/* * / 


[RIKI KK KK KI IK KKK III KK IK III I IK III KK KI IK KIKI IK KK IIR IK IK III IIIA KK KK IK | 
macro SrcA &n ; 
begin 
A-read = &n, OEA = enable 
endm; 


[RRKKKRKKKKK KKK KKKKKKKKKKKKKKK KKK KAKA KK KK KAKA KKKKKKKKKKKKKK KKK KKK KK KKK EK / 


/* * / 
/* SrcB — select B register source x / 
/* x / 


[ RKRKKKKEKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKEKK KKK KKK KKKKKKK KR / 
macro SrcB &n ; 


begin 
B-read = &n, OEB = enable 
endm; 
[ RRKKKKKKKKK KKK KEK KKK KKK KKK KKK KKK KKK KEK KKK EK KKK KKK KKK KKKKKAKKKKKKKKR KKK KKK / 
/* xf 
/* DestA — select A register destination and size | x / 
lies x/ 


[KKK IK KIKI RIK KIKI OK IKK IK IK RIK IK KKK IK IK IR IK IK IK KIKI KKK KIKI KIRK KKK IK KER KK EK / 
macro DestA &n &8ize; 
begin 
A-write = &n, Wrt_enable A = &size 
endm; 


[RK KK KR RK RRR III RRR KIRK RRR RROIIT I IIR IR ROK OKO IIIA KR KK KK / 


/* | * / 
[* DestB — select B register destination and size kf 
ee */ 


[RRR KKK KK KEK RK KKK RK KKK KKK KKK RK KKK KKK KKK KKK KK KK KKK KKK KKK KKK KKK KK KKK KKK KKK / 
macro DestB &n &size; 
begin 
B-write = &n, Wrt_enable B = &size 
endm; 
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5.4 MICROCODE DEVELOPMENT 


5.4.1 Step Engineering 
32-Bit Development Tools 


Step Engineering offers an integrated set of powerful 
development tools for the design and development of 
microprogram-based systems. In particular, these devel- 
opment tools are well suited for use with 32-bit building 
block devices such as the Am29300 family of compo- 
nents from AMD. 


For the 32-bit system designer, the MetaStep Language 
System provides a powerful and flexible language defini- 
tion, design, and development system for the develop- 
ment of customized microinstructions and micropro- 
grams. An important feature of the language is the ability 
to support both high order language constructs and bit- 
vector level operations. In addition, comprehensive 
source level debug facilities are inherentin the language, 
with a link to the STEP-40 SDT hardware debug stations. 


The STEP-40 SDT is Step’s system-level development 
tool for AmM29300 32-bit microprogram-based design. It 
offers a comprehensive array of hardware tools and user 
interface software that supports every level of the devel- 
opment task. 


The MetaStep Language System 


The MetaStep Language System from Step Engineering 
is a powerful new microprogramming tool for the pro- 
grammer/designer who wishes to utilize microprogram- 
based devices such as the Am29300 family as well as the 
Am2901, the Am2910, the Am29116, and many other bit- 
slice or microprogrammable units. MetaStep is a full- 
featured and well-structured microprogram meta-as- 
sembler with advanced features that give the program- 
mer great power and flexibility. Both an elegant high 
order and a powerful bit-level 
MetaStep includes five interrelated language modules 
and an AMDASM-to-MetaStep translator program. 


A unique feature of the MetaStep Language is the 
MetaStep QuickLearn Environment. This integral envi- 
ronment expedites the development and debug of micro- 
programs by providing a menu driven, interactive pro- 
gram that gives the user instant access to a user- 
selected editor, a file display program, a directory listing, 
an automated definition file generator and the MetaStep 
assembler. This program lets the user easily generate a 
definition file, assemble a program, quickly move from an 
assembly error directly tothe line in his source code 
that contains the error, correct that error and return to 


language system, . 


assembly. With single keystrokes the user can select 
from a variety of options and move quickly from one 
programming environment to another. 


These features can greatly increase the speed and 
accuracy of definition file and microprogram generation 
by eliminating much of the tedious, time-consuming and 
error-prone task of catching and correcting syntactical 
errors. 


Unlike earlier, more primitive microprogram assemblers, 
the MetaStep language system provides both high level 
and low level programming constructs for the designer/ 
programmer. For the hardware designer/debugger, 
MetaStep supports any “close to the hardware” program- 
ming style with total control of bit level field constructs. 
This is termed bit vector level coding. MetaStep is also 
the ONLY microprogram meta-assembler to support true 
source level debug when linked to a STEP-40 SDT 
system. 


MetaStep supports a full range of macro instruction 
features that let the programmer easily and quickly take 
full advantage of the power inherent in devices such as 
the Am29332 ALU, the Am29331 Sequencer, the 
Am29334 Register File, the Am29C323 Multiplier and 
Am29325 Floating Point Processor. 


This flexible language provides the ability to create 
complex high level language constructs specifically tai- 
lored to your application. These constructs can be of any 
complexity, up to and including those of a custom lan- 
guage compiler. Of particular interest is the ability to 
intersperse bit-level instructions freely among high order 
constructs. This allows performance-critical code to be 
hand-crafted and placed within high order assembly or 
even high level language statements. 


Design rule constraint management, error checking, 
data field validation, user-defined warning messages, 
and automatic pipeline compensation mechanisms pro- 
vide a rich, defensive programming environment that 
permits error detection at assembly time, rather than at 
debug or runtime. 


MetaStep features include a free-form and position- 
independent syntax, informative listings of macro expan- 
sions, field assignments, default assignments, symbol 
cross references, and symbol table listings, automatic 
hardware-to-software bit position mapping, field check- 
ing facilities, pipeline delay facilities, constraint manage- 
ment, consumption of AMDASM code, 28 expression 
operators, close interface to runtime debug facilities, and 
generation of files that give runtime information in sym- 
bolic form. MetaStep also supports meta-disassembly. 





Reprinted with permission from Step Engineering, Inc. 
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MetaStep is presently distributed for use on five different 
types of systems: CPM/68K-based systems, MS/DOS- 
based systems, VAX/UNIX-based systems, VAX/VMS- 
based systems, and SUN UNIX-based workstations. 
Support for other operating systems will be added in the 
future. 


The five MetaStep language modules are called the 
Definition Processor, the Assembler Processor, the 
Linker Processor, the Format Processor and the UDS or 
User-Defined Symbolics Processor. 


The Definition processor is used to define a language for 
a given target architecture, field by field, with logical 
groupings where appropriate. The definition processor 
defines constraints over fields, groups of fields, and 
entire instructions. Included in the definition processor is 
the ability to define macroinstructions, constants, and 
variables only once, and to then make those values 
available to the entire language system. 


The Assembler processor is a macro-driven, relocating 
and constraint maintaining microprogram assembler. It 
produces relocatable object modules, error, warning, 
and user-defined messages, and symbolic output for use 
by the linker and system debuggers. 


The Linker processor generates absolute code as wellas 
debug, symbol and structure tables from definition proc- 
essor and assembler processor output files. — 


The Formatter processor takes the absolute object file 
output of the linker and extracts several different types of 
-information. These include a binary output file loadable 
into a STEP-40 SDT development tool, a hexadecimal 
output file, a symbol file with user program global labels 
and addresses, and a debug file for on-line assembly/ 
disassembly and source level debug. 


The User-Defined-Symbolics processor automatically 
generates User-Defined-Symbolics or UDS files. This 
frees the debug engineer who wishes to perform debug 
functions at the source level from the task of redefining 
the symbolics of the language every time he does a re- 
assembly. 


The AMDASM-to-MetaStep translator offers the ability to 
take current AMDASM assembly source code and auto- 
matically translate that source into a syntactic form that 
is accepted by the MetaStep assembler. 


MetaStep can be configured to execute in two environ- 
ments: the station model, intended for use ona STEP-40 
SDT development station; and the no-station model, 
intended for use in environments that do not use the 
STEP development stations or MetaStep language sys- 
-tem debug and symbol files. 


Some of the more important features of MetaStep are: 


Free-form, non-positional keyword syntax 


Powerful macro facility 


Symbolic field names 


Data types such as strings, integer, and 
enumeration 


lf and for assembler directives 


Case statements 


Recursive expression facility 


Attribute operators 


Modular programming support 


Design rule management 


Automatic pipeline delay compensation 


Relocatable object code 


Any order bit-to-field assignments 


Link to true source level debug 


Easy integration to hardware debug station 


Consumes AMDASM source code 


Fast (10,000 fields/minute) one-pass operation 


MetaStep solves the problems associated with older 
positional microprogram assemblers, t.e., the difficulties 
in keeping track of fields and field values by rote and 
precise positioning, the lack of any value or error check- 
ing mechanisms, the lack of a link to a hardware debug 
system at the symbolic level, and the lack of any means 
of reconstructing backwards fromthe microword to the bit 
fields that comprise it. 


MetaStep provides the non-positional capability to define 
fields in logical order rather than simply by microcode 
instruction address, and includes support for nested 
macros, case structures and keyword parameters. The 
following is an illustration of a partial MetaStep program. 
As can be seen below, MetaStep has the ability to 
support both bit vector and high level coding techniques. 
The upper program segment illustrates a field by field 
programming style that uniquely declares each pertinent 
field in the microinstruction word. The lower segment 
shows a second MetaStep example that uses only high 
level statements to perform the same operation! As can 
be imagined, utilizing high level language constructs 
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greatly eases the programming task. For convenience 
and power, the programmer can intermix low level and 
high level program statements and/or start his program- 
ming task with simplistic statements and then grow into 
more complex usages as his experience grows. 


Two illustrative MetaStep program statements: 


Should the programmer/designer wish to program at the 
bit vector level, a simple MetaStep bit vector level pro- 
gram could be written like: 


OP116 = TORAA, 

SRCDST = OR, REG = Rl, 

CTLYEN = YEN_L, CCMUX = T1, 

2910INST = CONT, TCONTROL = NI, 

JMPADR = WALK, DLE = DLE_H, OET = OET H, 
SRE = SRE_L, IEN = IEN L, 

OEY = OEY _L 


° 
¢ 


A comparable MetaStep partial program using High 
Order Language or HOL constructs would look like this: 


ACC <— ACC OR Rl 


While the previous example illustrates the simplicity of 
using MetaStep, the microprogrammer may very well 
be more concerned with power and flexibility. Devices 
like the Am29332 are complex devices with powerful in- 
struction sets. To best take advantage of their power, 
MetaStep can incorporate all of the possible configura- 
tions of an Am29332 instruction into one clear MetaStep 
instruction. | 


For example, there are numerous options available to the 
programmer on each Am29332 instruction. Fixed length 
and variable length instructions such as MOVEs, 
SHIFTs, ADDs, SUBTRACTs, MULTIPLY/DIVIDEs, of- 
fer several different source and destination locations 
depending upon the class of instruction. With MetaStep, 
a programmer need define each Am29332 instruction 
only once, using high level constructs such as the CASE 
directive to define all of the possible configurations of the 
instruction. Then throughout his program, he can utilize 
that definition with a simple high order instruction mne- 


monic that takes into account all of the various complica- 
tions associated with that instruction and data and source 
combinations. 


In addition, he can prevent microprogramming errors by 
providing error checking conditions within the instruction 
definition, so that illegal conditions are flagged at the 
assembly level, not at the debug level. 


In this way, the programmer can reduce a large and 
complex instruction set to a few easy to remember 
mnemonics. This frees the programmer to concentrate 
on the logic of his program. In this way, microprogram- 
mers can quickly apply all of the power of the Am29300 
family to his design. 


MetaStep system components share a common data- 
base and utilize common control constructs. The defini- 
tion processor provides the capability to define variables, 
a string facility that allows concatenation, and it supports 
cohesion operations as well as 28 expression operators. 
The definition processor's ability to nest macros, pass 
variables through macro expansions, and perform recur- 
sion makes it a powerful facility for creating custom 
languages. 


Constraint management facilities include a check de- 
scriptor that may be utilized to test constraints on a single 
field, a case branch, an entire microinstruction, or be- 
tween microinstructions. Most importantly, rules of the 
target architecture may be embedded in the language 
facilities to detect bugs at assembly time rather than 
debug time. This facility allows user-defined procedurai- 
based design rules to be enforced. 


With MetaStep, memory space controls allow code to be 
generated for not only multiple segments, but multiple 
memory segments. This allows a single program to 
generate code for modern architecture class machines 
such as Harvard class machines and data flow architec- 
tures that typically contain multiple program stores. 


A significant advantage offered by MetaStep is that the 
database files generated from the definition, assembler 
and linker are common and provide a method to pass all 
language constructs to debug tools such as the STEP-40 
SDT. This means that the STEP-40 development tools 
can now have the capability to use the language defini- 
tion files and all symbol tables to create true meta- 
disassembly. Powerful source level debug can greatly 
speed the development of any microprogram design 
and, in particular, as microprogram-based systems 
increase in complexity, true source level debug is a 
necessity. 
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MetaStep Quick Reference 


MetaStep System Overview 


* Common system elements shared between 
MetaStep processors 
« Five Processors 


Definition processor 

Assembler processor 

Linker processor 

Format processor | 
User defined symbolics processor 


¢ AMDASM to MetaStep translator program 


« COMMON ELEMENTS: All processors share data 
files and common structures. 


~- 


Common syntax and semantics: include forms of 
names, constants, directives and legal and 
illegal value definitions. 

Common directives include: 


Re Source Control Directives, 


- Listings - forms control, summary information, 
_- Include - source inclusion, 
- Format - listing headings, trailers, and 
control, 


- Flow-of-Control Directives 
- If - fully nested conditional control 
- For - repetitive conditional control statement 


* Macro facilities, including nested macro capa- 
bility and parameter passing and expansion. 
- Specification of assembly time constructs, 
- Shorthand specification of logical groupings 
of assignments. 
- Generation of warning and error messages. 


¢ DEFINITION PROCESSOR: accepts a definition 
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of the target system architecture and develop- 
ment environment. 


- Micro-architecture description: by means of in 


struction/field formats. 


- Instruction directive: names the architecture 


and specifies instruction length. Maximum in 
struction length is 1024 bits. 


- Field Description: defines a field as a group of 


bits (not necessarily contiguous) that perform a 
common function. Each field must be givena 
field description. 


A full set of field descriptors is as follows: 


* bits - define absolute bit locations of field in 
microinstruction 

* check - constraint check on assignment to this 
field | 


complement - two’s complement field value 

¢ default - provide value when field is not as- 
signed | 

¢ display - provide debugger and default radix 
information | 

¢ invert - one’s complement field value 

¢ length - specify length of field 

« mask - truncate values to field length 

¢ parity - this field is the parity field 

* reverse - reverse bits in field 

valid - specify legal values for field 

¢ values - specify symbolic values for field 


VALUES, VALID, AND CHECK provide syntactic, 
semantic, and pragmatic verifications on aper field 
basis. 


VALUES provide syntactic information indicating 
what are acceptable values for assignment to a 
field. 


VALID provides semantic information, listing all 
the acceptable values for the field. 


CHECK provides a way of examining assigned val- 
ues inthe context of other field values orother state 
information. 


- The Case Definition: alternative field interpreta- 


tions. A case definition canbe specified for each 
field. It is a powerful mechanism for defining alter- 
native bit values for overlapping fields. 


- The Environment Description: allows the program- 


mer to specify the development environment, with 
constraints on field values, sequences of microin- 
structions, and the relationship between field 


values. 


Features include: 
¢ bitMap 

e macros 

¢ EQU symbols 
° variables 


- Constraints are provided in three general ways: 


« Symbolic values 
« Case branch constraints 
¢ Check descriptors - The check descript or asso- 
ciates a constraint macro with one of the follow- 
ing: 
- asingle field 
- acase branch 
- the entire microinstruction 


- Validations: numerous checks performed at defi 


nition time verify that field names and values in 


_ case branches are consistent. 
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¢ THE METASTEP ASSEMBLER: supports coding 
styles ranging from bit vector specification through 
high order language expression and each stage in 
between. Allows mixing of bit vector and HOL ex- 
pressions during coding. 


- Instructions: a series of comma-separated 
phrases. A phrase may be a field assignment, a 
macro-invocation, or a flow-of-control directive. 


- Field Assignments: consists of field name, followed 
by an equal sign, followed by an expression. 


- Macro Phrases: a macro-invocation is a macro 
name, optionally followed by parameters. Macros 
may be nested. 


- Relocation Facilities 


° org 
¢ align 
° reserve 
* segment 
« entry 
* point 
¢ external 
¢ METASTEP LINKER: combines all system elements 


into absolute code that can be loaded into ROMs or 
simulators. It also produces debug tables. 


Directives: 


- load 

* name 
locate 
reserve 
° fill 

* mapPoint 
¢ analyze 
¢ set 

° parity 


¢ AMDASM TO METASTEP TRANSLATOR: pro- 
duces MetaStep source statements from AMDASM 
source statements. 


The Step-40 SDT 


The STEP-40 SDT is the premier hardware-based devel- 
opment tool for any microprogram development task. In 
particular, it offers a comprehensive system for the 
design and debug of Am29300-based systems. It offers 
in one integrated chassis all of the development and 
debug tools needed for such an effort. With high reliability 
cabling and interconnect technology, the hardware 


chassis permits the plug-in addition of a wide range of 
distinct but interrelated hardware tools. An IBM-PC/AT 
computer system provides the human interface, mass 
storage, and I/O devices. 


Key Features of the STEP-40 SDT: 


« Fully supports 32-bit Am29300-based system devel- 
opment and debug. 


¢ Supports other microprogrammed products such as 
bit-slice, ASIC, DSP, or VLSI. : 


* Completely integrated hardware/software develop- 
ment station. 


¢ Powerful IBM-PC/AT-based microprogram support 
instrument. 


¢ Supports MetaStep, the first true high level language 
for microprogram development with in-line bit vector 
level support. 


¢ SOURCE LEVEL DEBUG available at all levels of 
hardware and software debug. 


¢ Reconfigurable, ultra-reliable 10 to 70 ns writable 
control store supports up to 64K x 512-bit arrays. 


* Real-time emulators for popular bit-slice AMD ALUs 
and sequencers. 


¢ Logic state analysis with trace memory and sophisti- 
cated multi-level control. 


* Performance analysis tools like histograms, timing 
analysis, access tracking and predicate analysis. 


« Regression Test tools for design validation. 


* Meta-Disassembly coupled with source edit, source 
management, version control, and on-line patch 
management. | 


¢ User-Defined Symbolics allows conditional disas- 
sembly of trace or any system data. 


* Sophisticated, easy-to-use screen-oriented editor 
with pop-up. help menus. 


HARDWARE resources include writable control store 
modules with the widest range of speeds and widths; 
real-time emulators for popular bit-slice parts such as the 
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Am2910, and Am29116; logic state analysis trace 
memory modules with flexible clock and breakpoint 
control modules; a histogram/timing analysis module for 
performance analysis tasks; and high speed memory 
simulation modules for more than 450 popular ROMs, 
RAMs, and PROMs. With a powerful high speed bus and 
modular hardware design, the STEP-40 SDT presents 
no hardware limitations for designers utilizing the most 
advanced microprogrammed devices. 


SOFTWARE tools include a sophisticated, easy-to-use, 
screen-oriented editor; a powerful turbo programmers 
environment for fast, error free program development 
and debug; MetaStep for superior high level and bit- 
vector level programming; User-Defined Symbolics for 
comprehensive on-line symbolic debug; Meta-Disas- 
sembly for true interactive symbolic debug with full 
access to MetaStep symbol tables; and performance 
analysis tools like histogram and time stamping, 
regression testing and automated test suite generation 
tools. The STEP-40 SDT is the first system to offer 
source level debug throughout the development and 
debug environment. 


Because the STEP-40 SDT is an IBM-PC/AT based 
development station, it gives you the best of both worlds: 
a wide range of comprehensive hardware debug re- 
sources coupled with a fast, convenient and well-sup- 
ported computer system. The IBM AT, in particular, offers 
the widest range of software support of any lab-based 
system in the industry. The IBM-PC/AT workstations 
have the power to match the STEP-40 SDT debug 
station. As intelligent hosts they can support advanced 
user interfaces and control the multiple hardware re- 
sources. In addition, system updates and new features 
. canbe added quickly thanks to the flexibility inherent in 
these standard workstations. As hardware needs 
change, the user need only add hardware modules to the 
STEP-40 SDT specialized hardware chassis. 


Hardware Tools 


Plug-in writable control store modules are available with 
flexible array configurations from 1K x 64 to 16K x 128 per 
module. Modules can be mapped into arrays of up to 64K 
x 512 bits in size. Access times vary from 70 ns to 10 ns 
(and even faster when RAM technology permits). 


The Writable Control Store (WCS) is a dual-port memory 
accessible from either the STEP-40 SDT or the target 
system. Both ECL and TTL RAM are supported with the 
industry’s most comprehensive array of memory emula- 
tion. Having up to 16K x 128 bits on a single WCS versus 
having many small boards connected with many cables, 
dramatically improves reliability and signal integrity. The 


user can configure to meet his design objective without 
sacrificing reliability or performance. Further, the STEP- 
40 SDT can support up to 32 independent arrays con- 
trolled by either a single or multiple clocks. 


Available Modules: 


eWCS-64 is the fastest STEP WCS. It uses 10 ns ECL 
RAMs and connects to the target via address and 
data pods containing ECL to TTL translators. Organ- 
ized by 1K x 64 or 2K x 82 bits. 


¢ WCS-128 provides twice the density of the WCS-64 
with 10 ns ECL RAMs. Organized in 2K x 64, 4K x 82, 
or 8K x 16 bits. 


*WCS-256 and WSC-1024 provide even larger memo- 
ries for applications with less demanding speed re- 
quirements. WCS-256 is configured as 4K x 64, 8K 
X 32, or 16K x 16. WCS-1024 is configured as 16K x 
64, 32K x 64, or 64K x 16 bits. Interface circuitry 
matches exact user memory specifications. 


LOGIC STATE ANALYSIS (LSA) - provides trace mem- 
ory modules with sophisticated clock, breakpoint and 
trace control. With true conditional bit-mapped disas- 
sembler (User Defined Symbolics or UDS), the LSA 
provides real-time 3-way branching using a 54-bit match- 
word to trigger the 25 MHz or 50 MHz trace memory. 
Linkage is provided to the symbol table of the user’s 
source code for access to symbolic debug information. 
Source code can be interleaved with trace samples for 
easy Cause (microinstruction) and effect Maeee sample) 
readability and comparison. 


TRACE MEMORY is provided with either 4K (TM-256) 
or 16K (TM-1024) bits of real-time trace memory at 
speeds of 16 MHz, 25 MHz or 50 MHz. These memories 
act as a circular buffer storing the last 4K or 16K store 
samples. Store clock filtering extends the effective buffer 
depth substantially by filtering out unwanted samples. 
Triggering and sampling is controlled by the trace 
control module. 


TRACE CONTROL modules include the sophisticated 
clock and breakpoint controls. With a screen editor 
display, the user can set up to five 54-bit (16 address, 32 
data, and 6 external qualifiers) matchwords per level to 
qualify trace memory sampling. Up to 16 independent 
levels for trace triggering or breakpoint are possible, with 
each level allowing for three way branching on an IF, 
ELSE-IF, ELSE-IF basis. A delay counter can be used on 
each IF branch to count occurrences o the 54-bit match- 
word or store cycles. 
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IN-CIRCUIT EMULATORS permit real-time emulation of 
popular bit-slice circuits such as the Am2910, Am29116 
and other popular devices. The user can directly observe 
the internal states of these chips as they execute his 
program. The usercan examine and modify registers and 
stacks. Execution control includes single step, multiple 
step and run program commands. Multiple emulators 
can be simultaneously controlled from a single emulator 
control module. STEP in-circuit emulators will operate in 
real-time at the full rated speed of the emulated circuit. 


MEMORY EMULATOR modules support a wide range of 
RAM, ROM and PROM devices. Over 450 popular 
memory devices can be emulated. 


PERFORMANCE ANALYSIS modules provide the hard- 
ware support for software features like histogram and 
time stamping. Time analysis canbe performed with 12.5 
ns resolution. Histograms can be in absolute time or in 
microcycles for precise execution measurements. A 48- 
bit timer/counter permits continuous anayele over hours 
and days, not just seconds. 


Software Tools 


The STEP-40 SDT fully supports METASTEP, thus 
providing the world’s first truly high level microcode 
development language in a fully integrated development 
station. 


METASTEP QUICKLEARN PROGRAMMING ENVI- 
RONMENT is a unique facility that speeds the develop- 
ment of MetaStep programs. The user can quickly switch 
from facility to facility without losing his place in his code. 
This is particularly useful during program debug and 
patch. 


SOURCE LEVEL DEBUGis another unique capability of 
the STEP-40 SDT. With the MetaStep language as the 
foundation, a microcode-based project can be greatly 
speeded by utilizing symbolic information throughout the 
debug cycle. A truly interactive symbolic debug capabil- 
ity, source level debug permits on-line meta-assembly, 
meta-disassembly on-line, run-time editing at the source 
level, and directly readable displays. 


All STEP-40 SDT commands can reference symbolic 
labels defined in MetaStep. Thus, the user need enter 
and define his labels only once. Later he can use them 
throughout his debug tasks without reentering or redefin- 
ing them. This is a requirement for convenient debug of 
relocatable microcode. Other systems require that the 
user spend endless hours defining his symbolic informa- 
tion each time he reassembles his code. Source Level 
Debug also means that he can control his hardware 
debug resources using this symbolic capability. 


User Defined Symbolics (UDS) provides complete dis- 
play and control of microcode, trace data and emulator 
data. Any arbitrary digital word can be conditionally 
disassembled into any symbolic representation. Unlike 
older systems that merely allow permutation of some 
fields in groups of contiguous bits, UDS gives the usera 
general purpose bit mapping (binary to symbolics) capa- 
bility unmatched by any other system. UDS has great 
utility in hardware trace situations. 


META-DISASSEMBLER capability allows the source 
definition to be accessed by the debug process and 
provides the user the abilities of disassembling his 
source code in-line, assembling in-line, plus insertion of 
additional microcode. 


PERFORMANCE ANALYSIS capabilities include histo- 
grams and time stamping. 


HISTOGRAMS permit absolute time or microcycle 
analysis of your microcode execution. With a 48-bit 
counter, time analysis can be performed over days and 
weeks if necessary, not just seconds. This analysis can 
give you graphical information showing where code 
optimization can best help overall system performance. 


TIME STAMPING includes a 12.5 ns resolution to easily 
measure time between captured system events and 
provides both absolute and relative time stamping in both 
time and microcycles. 


QUALITY ASSURANCE TOOLS aid in reducing overall 
system costs and in rapid test development. These 
include access tracking, predicate analysis and 
MetaStep facilities for maintenance of source and ver- 
sion control. 


REGRESSION TESTS such as AUTOSTEP provide 
the capability to generate, store and reuse system vali- 
dation tests from design definition throughout the life of 
the product. 


Hardware Specifications 


6-Slot Mainframe: 
6-user slots available per chassis 
Expandable backplane 


MetaMachines: 
Upto 32 per mainframe, each with separate data, ad- 
dress and/or clock inputs. 


Writable Control Store: 
Total Address Space: 64K deep x 512 bits wide. 
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Modules: Clock, Trace and Breakpoint Controller: 
WCS-64- 1K x 64/2K x 32, 16-level, 54-bit match word, conditional trace and 
10ns or 15ns RAM speed. break supported. 
WCS-128 - 2K x 64/4K x 32/8K x 16, 
15ns or 25ns RAM speed. Logic State Analysis Control: 
WCS-256 - 4K x 64/8K x 32/16K x 16, 
25ns or 35ns RAM speed. 16-states, comprehensive control through counters, 
WCS-512 - 4K x 128/8K x 64, timers, conditionals, triggers, and unlimited break- 
10ns, 15ns, and 25ns RAM speed. points. | 
WCS-1024 - 16K x 64/32K x 32/64K x 16 | 
35ns or 70ns RAM speed. Additional information about MetaStep, the STEP- 
WCS-2048 - 16K x 128/32K x 64, 40 SDT and other Step tools for developing 
25ns, 30ns or 70ns RAM speed. Am29300-based systems is available upon request 


from Step Engineering. Please contact: 
Simulation Pods: 

Step Engineering, Inc. 
ECL to TTL/ TTL to ECL conversion 661 East Arques Ave. 
TTL specifications P.O, Box 61166 
Unlimited number of arrays Sunnyvale, CA 94088 

(408) 733-7837 
Trace Memory: (800) 538-1750 

TWX: 910-339-9506 . 
Sizes: 4K x 64 bits or 16K x 16 bits 
Number: up to 8 modules per trace controller. 
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5.4.2 Microtech Research 
~mcASM Structured Microcode Assembler 


The mcASM microcode assembler provides software 
support for the Am29300 family. A second generation 
Structured Microcode Assembler, mcASM was the result 
of a joint effort between Advanced Micro Devices and 
Microtec Research. Ten years of bit-slice and microcode 
assembler experience within both companies has been 
combined with the latest software technology to produce 
this advanced implementation of a relocatable microc- 
ode assembler. 


Special support is provided forthe variable formats found 
in the Am29300 family. This support is an additional 
benefit as it provides constraint management for the 
entire microcode word. New features make mcASM 
faster and easier to use than previous microcode assem- 
blers. These features allow the programmer to concen- 
trate on the target system algorithm, thereby achieving a 
more competitive target system. 


mcASM Features 
* Am29300 family mnemonic definitions included 
¢ Hosted on VMS/VMS and PC/DOS 


¢ PROM programmer, Microtec, AMD, and STEP | 
output formats 


* Relocatable code segments 

* Overlay support 

¢ Macros with keyword parameters 

* Automatic selection of word format 

* Keyword syntax 

¢ Local symbols for each field 

° Fields defined with non-contiguous or contiguous 

bits 

Description 
As a meta-assembler, mcASM is used to assemble 
source programs targeted for a user defined set of 
hardware. First, a model definition program, mcDEF, is 
used to define the target mnemonics and their corre- 
sponding bit patterns for the assembler, mcASM. Then, 


mcASM assembles the user’s source program into mi- 
croinstructions for the target. 


This meta-assembler is optimized for microcode applica- 
tions where very wide word widths (up to 1024 bits) are 
not uncommon. Alibrary of pre-defined part definitions is 
included with mcASM for the Am29300 family and other 


AMD microcode driven products to help the user quickly 
build the hardware definition file. 


Four related programs make up the product: mcDEF, 
mcASM, mcLINK, and mcPROM. 


A model of the target system is defined using the mcDEF 
definition language. The model is then compressed into 
a lookup table by the definition program, mcDEF. 


The model lookup table allows the microcode assembler, 
mcASM, to translate the user’s assembly language 
source code into microcode bit patterns that drive the 
target system. Object modules generated by mcASM are 
inarelocatable format. Thus, smaller, more manageable 
source files can be generated. These can be independ- 
ently updated and quickly reassembled. 


Relocatable object modules are linked together with 
mcLINK to form an absolute executable microcode pro- 
gram. The program may include overlayed segments to 
conserve target system memory. Four formats may be 
selected as the mcLINK output format. These include 
mcFMT, AMDASM, Microtec META29, and STEP Engi- 
neering GENHEX. 


A fourth program, mcPROM, converts the linker output 
into PROM files that can be downloaded into a PROM 
programmer. DATA I/O ASCII format and BNPF format 
are supported. 


Figure 5-4 shows an overview of the mcASM develop- 
ment process and the following sections describe each 
component of the mcASM package. 


MCDEF - Definition Program 


The mcDEF definition program is a table builder that 
converts a model of the target hardware into a compact 
lookup table for later use by the assembler. The model is 
required by the assembler to describe how mnemonic 
names, used by the programmer, are  eOIvEne? into bit 
fields in a microcode word. 


mcDEF accepts an input file that describes the field 
structure of the microcode word. Each field is independ- 
ently described so it can be uniquely referenced by name 
inthe assembly source code. The programmer can then 
directly reference any field and assign a value without 
having to put the value in aprescribed position ina source 
statement. 


Each field can also be assigned a default value so that all 
fields do not need to be encoded in each line of source 
code. Mnemonics assigned a value for a field are local to 
that field. The same mnemonic can be assigned a 
different value in another field. A partial example of a 
processor model is shown in Figure 5-5. 





Reprinted with permission from Microtech Research, Inc. 
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Figure 5-4. Overview of the Microtec Research mcASM Development Process 
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Sample Microword 


a a 


NMicroword Definition 


Mem: bit(40), length (1), 
values (O:read, 1:write), default (read); 
MAR: bit(38), length (2), 
values (O:nop, 
1:load, 
2:enable, 
3:ld-en); 
Position: bit(32), length(6), default (0); 
Width: bit(27), length(5), default (31); 
Am29332: bit(18), 
| values(see file Am29332.def); 
Borrow:bit(17), — length(1), default (0); 
Hold: bit(16), length(1), default (0); 
Data __bit(0), length(16), default (0); 


Figure 5-5. Sample Microword Organization 


In some cases fields may overlap, resulting in several 
independent formats being defined for the same bits. 
mcDEF provides a structured case statement that de- 
scribes each of the formats independently. This allows 
very simple selection of the required format within the as- 
sembly source code. Selection may be made by a 


MICROWORD LAYOUT 


<——16-bits ——> 


specific bit setting, use of a unique field name, or assign- 
ing a value unique to one of the cases. 


Acase statement demonstrating field overlaying is illus- 
trated in Figure 5-6. 


(case 0, 2-bits MemCtrl, 14-bits Addr ) 


3 _ or 
(case 1, 16-bits of immediate Data ) 


mcDEF DEFINITION 


case of 
0: begin ( two fields ) 
| addr: length(12); ( address field ) 
MemCtrl: — length(4); ( memory control field ) 
end | 
1: begin (or one field ) 
data: bits (16) ( immediate data field ) 
end 
_ endcase; 


Figure 5-6. A Variable Format and Case Structure Definition 
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Inthe source program, the format is chosen by specifying 
‘data’, or by specifying ‘addr’ and ‘MemCtrl’. Any attempt 
to select both formats will result in an error at assembly 
time. 


mcASM - Assembler Program 


Source microcode is assembled by using mcASM, a 
structured microcode macro assembler that produces 
relocatable object modules as output. MCASM reads the 
— source file and the model definition table as input. Each 
statement of source code is then converted into one or 
more microcode words as defined by the definition table. 
The output object module format is relocatable, thereby 
allowing separate modules to be linked into a larger 
executable program. 


Microcode instructions are generated by assigning val- 
ues to the fields that were defined in mcDEF. Assignment 
statements are used to assign values (i.e. fieldname = 
value), allowing the fields to be referenced in any order. 
Fields with acceptable default values do not need to be 
encoded. An example, using the model defined above, is 
shown below. 


loop: Am29332 = INCR-A 
MAR = enable, Addr = fetch ; 


Several features are demonstrated by this example. 


* A single instruction can be continued on several 
lines without special notation. 


* Field references can be grouped so that they refer 
to acommon device or action. Fields with accept- 


able default values (such as Mem = read) do not 
have to be encoded. 


¢ A reference to the Data field in the microword 
would generate an error because it conflicts with 


the case selection caused by the use of the Addr 
field. 


An extensive macro facility allows the user to simplify the 
coding task by representing a large collection of field 
assignments with a single name and a few parameters. 
Macros also allow several microcode words to be gener- 
ated with a single macro definition. The ability of mcASM 
macros to support assignment statements allows the 
user to define a higher level language that greatly re- 
duces coding errors and coding time. For example, the 
instruction in the example above can be replaced with: 


loop: ALU INCR-A; 


where ALU is the macro name. The macro ALU assigns 
the parameter INCR-A to a variable field and fixes the 
values of the rest of the fields such as MemCtrl and Mem. 
Macros can also test the parameter values or names and 
then conditionally generate one of several outputs. 


mcASM allows the programmer to structure microcode 
source into segments. Labels used within a segment are 
local to that segment allowing the labels to be reused in 
other segments with new values. Individual segments 
and collections of segments (modules) are separately 
assembled so that the whole program does not have to 
be reassembled for each change in source code. 


mcLINK - Linking Loader Program 


mcLINK collects the separate segments generated by 
the assembler and combines them into one executable 
program module. In addition, mcLINK supports genera- 
tion of overlays that can be separately loaded into a 
common memory area. 


Four absolute output formats are provided. Standard 
formats supported by mcASM include AMD AMDASM, 
STEP Engineering GENHEX, and Microtec META29. 
These three formats allow mcASM code to be used with 
existing development systems. A fourth format, called 
mcFMT, includes complete information for implementing 
overlays and performing symbolic debugging. 


While the mcLINK program can generate separate over- 
lay files in addition to the root program files in these three 
standard formats, a single file including overlays and 
symbol informationis generated when the mcFMT output 
is selected. 


mcPROM - PROM Formatter Program 


Microcode is generally stored in PROMs in target ma- 
chines. mcPROM is provided to divide the absolute linker 
output into separate PROM sized files. These files can 
then be downloaded to a PROM programmer through a 
user supplied communication package. 


Program Features 


The Microtec mcASM structured microcode assembler 
system has the following features. 


Definition Program Features 


Microword lengths up to 1024 bits. 


Variable formats, with multiple fields, predefined in 
cases statement 


Field definition attributes : 


BIT - a field may start at any microword bit — 
LENGTH _ - total field length (max 16-bits) is 
specified | 
VALUE - local mnemonics are assigned to 
field values 
VALID - only values in this list can be used 


DEFAULT - the field is assigned a default value 
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Value modification operators : 


COMPLEMENT - uses two’s complement of the 
value 


INVERT - inverts all the bits 
MASK - removes high bits to set size 
REVERSE - reverses the bit order 


Definition program directives : 


TITLE - adds text string to top of each 
page 

INSTRUCTION -defines the width of the micro- 
word 

(NO)LIST - (does not generate) generates a 
listing 


(NO)OUTPUT - (does not generate) generates 
definition table 


(NO)XREF - (does not add) adds cross refer- 
ence 

EJECT - advances listing to next page 

END - marks end of definition program 


Assembly Program Features 
Symbolic addressing 

Conditional assembly facility 

Values assigned to field names 
Powerful macro definition commands : 


MACRO - specifies macro name and para- 
meters 

BEGIN - marks the start of the macro 
definition 

LOCAL - defines symbols local to this macro 

GLOBAL - defines symbols global to program 

OUTPUT - outputs source code 

IF - processes a Statement if variable 

is true 

WARN - issues text string to output listing 

ERROR - sends text to listing, ends macro _ 

END - marks end of the macro definition 


Flexible macro reference : 


Parameter may precede macro name 
(P1 macro_name P2) 


Positional parameters are assigned values 
Keyword parameters have default values 


Relocatable output with multiple segments : 


SEGMENT _ - starts or restarts a user-named 
segment 
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ENTRY - lists all entry points to a segment 
EXTERNAL _ - lists all labels defined outside the 


file 


Assembler directives : 


PROGRAM __ - names first segment and definition 
file 

EQU - assigns a constant to a name 

GLOBAL - defines variable available to all 
segments 

INCLUDE - adds additional source file inline 

ORG - sets location counter to new value 

TITLE - adds a text string to each listing 
page 

(NO)LIST - (does not generate) generates 
listing file 

(NO)OUTPUT - (does not produce) produces 
output file 

(NO)XREF _ - (does not generate) generates 
cross reference 

EJECT - advances listing to next page 

END - marks end of assembly source ~ 


Link Program Features 


Combines independently assembled relocatable 
object modules 


Resolves external references 
_ Adjusts relocatable addresses into absolute ad- 
dresses 


Versatile user commands : 


LINK - loads specified segments from 
specified file 

ORG - changes value of location counter 

ALIGN - Starts next segment at an address 


module n 
OVERLAY - starts and names an overlay 
SET - defines external symbols at link time 
TRANSFER- reads commands from another file 
END - marks end of command entry 


Output listing controls : 


Load map - area and overlay name, base ad- 
dresses 


Defined and undefined symbol references 
Optional symbol cross reference 
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Object module output in one of four formats 
Microtec mcFMT with overlays and symbols 
Microtec META29 
STEP Engineering GENHEX > 
AMD AMDASM and AmSYS29 


Conversion Utility Features 


* Separates abslute file into PROM size modules 

* Format is DATA I/O ASCII hexadecimal or BNPF 
© Column overlaying 

¢ Column switching 

e¢ Automatic parity generation 


Minimum Hardware Required 


Any Digital Equipment Corporation VAX System that op- 
erates under VAX/VMS. The software product typically 
requires 450K bytes of diskstorage after installation. 


An IBM PC or compatible system that includes at least 
512K bytes of total main memory and one (1) megabyte 
of disk storage. Typically the product requires 600K bytes 
of disk space for permanent installation with additional 
disk storage required for temporary files. Size of tempo- 
rary files depends on the volume of user input. 


Prerequisite Software 


For distributions pre-installed for Digital Equipment Cor- 
poration computer systems, the appropriate VAX/VMS 
operating system. 


For distributions pre-installed for IBM PC or compatible 
systems PC-DOS or MS-DOS versions 2.1 and newer. 


Support Category — Microtec Research Supported 


During the warranty period, Microtec Research Inc., 
provides the following standard services if the customer 
encounters a problem with the Software Product: 


1. If Microtec Research determines the problem to be 
a defect inthe software product, Microtec Research 
will provide remedial service by telephone if neces- 
sary (1) to apply a temporary correction or make a 
reasonable attempt to develop an emergency by- 
pass if the software is inoperable, and (2) to assist 
the customer in preparing a Software Performance 
Report (SPR). 


2. If customer diagnosis indicates the problem is 
caused by a defect in the software product, he may 
submit an SPR. Microtec Research will respond to 
problems reported in SPRs that are caused by de- 
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fects in the current, unaltered release of the Soft- 
ware Product via a newsletter. The newsletter 
provides notice of the availability of corrected code. 


Any updates to this product released by Microtec Re- 
search during this warranty period will be provided to the 
customer on standard distribution media at prices speci- 
fied in the prevailing Standard License Fee List. Non- 
standard media can be supplied upon request for an 
additional fee. 


Service required because of customer use of other than 
the current, unaltered release of the Software Product 
operated in accordance with the Software Product De- 
scription (SPD) will be provided at Microtec Research’s 
current rates, terms and conditions. 


Ordering Information 


All binary licensed software, including any subsequent 
updates, is furnished under the licensing provisions of 
Microtec Research’s Standard Terms and Conditions of 
Sale. These terms provide, in part, that the software and 
any part thereof may be used on only the single CPU on 
which the software is first installed, and may be copied, 
in whole or in part, (with the proper inclusion of the 
copyright notice and any proprietary notices on the 
software) only for use on this CPU. 


Refer to the Standard License Fee List for further order- 
ing and media information or consult Microtec Research. 


Software Product Service 


Post warranty service for this product is available to 
licensed customers by purchasing a Software Product 
Service Agreement. 


Full Documentation 


Technical reference manuals are included as part of the 
software product. These manuals provide the informa- 
tion needed to use the software product and are written 
to be used in combination with the language reference 
materials provided by the manufacturer of the micropro- 
cessor. Manuals included are: 

* Microtec mcASM User’s Guide 


¢ Microtec mcASM Reference Manual 


* Microtec mcASM Installation Guide 
For additional information contact: 
Microtec Research, Inc. 
3930 Freedom Circle, Suite 101 
Santa Clara, CA. 95054 
(408)733-2919 
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5.4.3 Hilevel Technology, Inc. 
Emulyzer and Hale 


Hilevel’s DS3700 Series Emulyzers provide full microc- 
ode development support for Advanced Micro Devices 
Am29300 Series building blocks. The DS3700 combined 
with HALE (an advanced retargetable Macro-Meta As- 
sembler), with software for firmware integration and 
debug, and with a host computer provides a complete 
microcode development system. 


DS Series Emulyzers 


The DS3700 system employs an internal bit-slice archi- 
tecture combined with ECL design to achieve high 

speed, decrease system latency, facilitate product up- 
grades, and implement unique features. The DS3700 
range of features includes: 

¢ HALE, an Advanced Macro-Meta assembler 

¢ 10 ns WCS provides 25 ns access times at target 

¢ 50 MHz logic state analyzer 

¢ 50 MHz pattern generator 


Full software support for PC or VAX based 
operation 


Interactive source code debugging 

* Source presentation of WCS and trace 
16 level unrestricted triggering 
Microcode performance analysis 


User-defined display formats with bit permutation 
for both WCS and logic analyzer data 


Command language and command file execution 
of system operations 


¢ Up to 512 bit wide WCS and trace 


The DS3700 Emulyzer is available in three different 
configurations to accommodate varied Am29300 devel- 
opment needs: 


1) as an integrated microcode development system 
connecting to an IBM-PC/XT/AT or compatible 


2) as a stand-alone microcode development worksta- 
tion connecting to your host computer. 


3) as an Emulyzer using a VT100 compatible terminal 
providing memory emulation and logic analysis. 


The Emulyzer can be remotely operated from virtually 
any host computer, over either the IEEE-488 or RS232 
standard interfaces. A series of specific computer com- 
mands provides a high degree of Emulyzer control and 
programming flexibility, with provisions for rapid data 
transfer. 


Writable Control Stores 


The Writable Control Store. (WCS) portion of the DS3700 
Emulyzer is a high-speed memory which can be written 
to orread from by the DS3700 operator, the development 
workstation, the host computer, and your target machine. 
For RAM emulation, the microprogrammer may read and 
write to the WCS from the target processor. WCS 
memory options with access times of 25 ns at the target 
are ideal for high speed Am29300 operation. 


A choice of fifteen different WCS memory modules are 
available to provide the user with a selection of speeds 
and densities to fill any microprogramming application. 
Memory boards are designed to optimize access times. 
Allmemory modules are 16 bits wide and are available in 
depths of 1K, 4K, or 16K. Modules may be configured in 
parallel for widths up to 512 bits. 


The DS3700 Series can support WCS arrays up to 16K 
deep or 512 bits wide. Additionally, the WCS may be 
configured to support multiple arrays with each array 
configured for a unique size and speed. 


Logic Analyzer 


The DS3700 Series Logic Analyzer section is configured 
in 16 bit increments. Each increment may be clocked 
independently, or any number of these can be clocked 
synchronously. Trigger words may be defined across the 
entire trace width and qualified with ANDs, ORs, comple- 
ment, and not equal. Up to 256 trace channels are 
available in a single chassis; however, chassis may be 
chained for greater widths. Either 4K or 16K deep trace 
memories are available at 25 MHz, 35 MHz, and 50 MHz. 


Trace synchronization is nominally provided via selec- 
tion of one of five clocks. Alternatively, each channel 


- group (16 data channels/one clock per group) can be 


synchronized to compensate for clock delays, skewing, 
and multiple timebases. The DS3700 clocking scheme 
allows address (or data) to be delayed one clock cycle to 
align the address trace with its associated data. 


Symbols for trace disassembly and triggering are auto- 
matically created by HALE (Hilevel's Assembler). Addi- 
tional symbols may be defined and stored in the symbol 
table. The symbol table can be saved and restored for 
future use. 


The DS3700 has four triggering modes. 


Single Trigger: Single matchword defined across all 
address and data trace bits with don’t care bits. 


External Trigger: A hardware input may be pro- 
grammed to act as a trigger, conditional trigger, or arming 
condition. 





Reprinted with permission from Hilevel Technology, Inc. 
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Multi-Level Trigger: Provides 16 levels of trace control 
with up to 4 conditions per level. Multiple commands 
(thirteen total) may be executed on the current clock 
cycle in real-time for any of the 4 conditions. Trigger 
patterns may be specified across the entire address and 
data fields including “don't care” bits. 


Unlimited Break Points:Provides either 16K, 64K, or 
1M of address breakpoints/triggers. 


The DS3700 provides 16 active user-defined trace dis- 
play headings and data formats. Any 4 bits of the trace 
data may be used to change display formats dynamically. 
In addition, symbols may be defined across the entire 
address and data fields and displayed along with the 
formatted data. 


Trace masking is achieved by entering mask addresses 
in atable and then toggling the trace mask function on or 
off. 


Trace permutations (as well as WCS permutations) are 


available to permute the order of display for clear presen- 


tations of the data. 


During debug, using the Interactive Trace Disassembler 
with the DS3700 allows viewing of both the formatted 
trace with symbols and the related source code with 
comments. 


Additionally, trace data may be displayed graphically as 
waveforms. Movement of linear cursors permit compari- 
son of waveforms and viewing of timing information. 


Microcode Performance Analyzer 


The TIM-1E option provides an asynchronous clock for 
time-tag and performance analysis operations. Resolu- 
tion of the clock may be set to either 15 ns or 250 ns in 
three operating modes: 


Absolute Time: Allows elapsed time to be measured 
from any selected event; multiple reference points may 
be defined. 


Time Interval: Provides a measurement of the time 
interval between adjacent trace data or any locations in 
the trace buffer. 


Performance Analysis: Up to 15 groups of addresses 
may be defined as performance groups. 


Performance groups of addresses can be defined to 
generate statistical performance analysis histograms, 
address vs. frequency of address and address groups vs. 
time spent in groups, to allow the engineer to measure 
firmware efficiency. For example, time spent in subrou- 
tines, interrupt handlers, and in arithmetic functions can 
be measured. Dynamic graphing is available to actually 
view the performance in real time. 


Pattern Generator 


The PG201 Option allows the Emulyzer to function as a 
digital stimulus response tester. Sequential or pro- 
grammed vectors (or instructions) may be applied to the 
target and the response recorded. Using the Emulyzer 
Programming Language, the trace may be uploaded and 
compared to a known good file. The multilevel trigger 
may be used to set conditions for the pattern generator 
so that different vectors may be applied after a certain 
response has been recorded. The PG201 card also 
allows fast firmware-generated patterns to be inserted 
anywhere within the WCS. Walking ones, walking zeros, 
checkerboard, and random patterns may be merged with 
writable controi store or used to fill the WCS. The PG201 
may be used to emulate a controller, such as the 
Am29PL141, which controls or sequences the target 
hardware. 


Hale - An Advanced Retagetable 
Macro-Meta Assembler 


* Includes Am29300 Definition Files 

¢ Increases User Productivity 

¢ Allows Coding Optimization 

* Pipeline Macros Ideal for Am29300 Blocks 
¢ Assembles on Several Computers 
Relocatable Linkable Code 

¢ Matched to Development System 


HALE provides the microprogrammer with a set of facili- 
ties to rapidly create instruction sets and quickly write, 
assemble, and check his programs against design rules. 
For building custom instruction sets or emulating instruc- 
tion sets, HALE increases programming efficiency and 
gets the job done fast. 


HALE supports several programming techniques to 
accommodate varied programming styles and architec- 
tural requirements. Free-formatting, fixed-format instruc- 
tions, position-independent code, macros, and pipeline 
macros each provide specific programming benefits. 
Techniques are often mixed in programs to provide the 
optimum control and ease of programming. 


Am29300 programmers using HALE receive the benefits 
of an assembler that allows source presentations (your 
actual instruction), comments, and symbolic debug when 
used with a HILEVEL DS Series Emulyzer. These inte- 
gration tools speed development. | 


HALE is easy to use and is a quickly learned assembler. 
Generating productive code with HALE begins within the 
first few minutes of use. Straight forward coding and 
simple definitions of powerful high-level macros suai! 
code to be tested right away. 
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Pipeline macros allow the programmer to optimize the 
utilization of his hardware resources. By permitting 
macros for fields, combinations of fields, or along func- 
tional boundaries, and allowing multiple invocations of 
the macros while the earlier calls are still generating code 
allows highly overlapped, and compacted code to be 
written. 


Pipeline macros are particularly useful for the Am29300 
series since they are designed along functional bounda- 
ries. Pipeline macros written for the multiplier 
(Am29C323), a floating point processor (Am29325), and 
an arithmetic logic unit (Am29332) in an architecture 
combining these resources would allow tight control and 
economy of code for their independent and interdepend- 
ent operations. 


Pipeline macros are well suited for n-stage pipelined 
architectures, DSP algorithms, pipelined multiplier op- 
erations, and adding programming elegance. Once pipe- 
line macros are written for an element, they are invoked 
and closed out with two simple commands. Up to eight 
pipeline macros can be operated simultaneously. Pipe- 
line macros are position independent. 


Calls to pipeline macros are limited only by the process- 
ing element’s latency period, allowing maximum data 
flow processing. Pipeline macros also simplify coding for 
elements that introduce pipeline delays into the target 
hardware. 


Pipeline macros may contain conditional assembly state- 
ments allowing the automatic selection of microcode 
sequences for a given operation. 


User definable errors allow the microprogrammer to 
assert design rules and check his code against them. 
This saves time by catching errors during assembly 
rather than at debug and integration time. When mi- 
croarchitectural constraints change, the program may be 
reassembled with new rules and checked against them. 
Instead of searching for potential errors, valuable time is 
saved by the automatic detection of errors. 


User definable warnings allow the programmer to write 
non-assembling messages at any location in the source 
program. These messages may be used to follow assem- 
bly program flows or flag untested routines. Incomplete 
cases within macros may be detected by inserting a 
warning message as the last case. If an undefined case 
is called, the warning will be displayed. Warning mes- 
sages assist the programmer in directing his attention to 
areas of concern and correcting them before they show 
up as problems during firmware integration time. 


While and Endwhile looping directives allow code be- 
tween these directives to be generated as long as a user 
specified boolean equation is true. While A<B, While 


A+B<C, and While A=B are examples showing the ver- 
Satility of this directive. “While loops” may be nested up 
to 15 levels deep. “While loops” are also particularly 
useful in pattern generation applications. 


ASCII statements convert ASCII code to its binary 
equivalent, which may then be imbedded within the 
microcode. Data may be coded directly into microcode in 
ASCIl format. ASCII conversions are useful for passing 
messages, strings, or variables from one part of your 
target to another. 


Macro facilities allow the assignment of aname to either 
a single microinstruction or to a sequence of microin- 
structions. Macros allow parameters to be passed to 
points within the macro body. A multiply macro may 
consist of 100 lines of code, yet may be invoked by a 
single call (i.e., Mult A,B.). Macros permit the generation 
of assembly language for your target or even higher level 
languages if one builds macros from macros. Macros 
may be nested up to 15 levels deep. Macros may call 
pipeline macros to generate extremely powerful code. 


Conditional assembly statements can be used to 
generate high-order instructions that can accomplish a 
number of things based upon variable inputs: for ex- 
ample, executing either signed or unsigned functions, 
selecting the correct microcode for a specific task (auto- 
matic instruction selection), or interrogating the hard- 
ware and conditionally executing different microcode 
sequences (context switching). Conditional assembly 
statement allows the construction of powerful macros. 


String facilities are used to identify variables and com- 
pare entire or whole portions of strings with each other. 
When combined with other assembly directives, different 
routines based upon the results of the compares can be 
invoked. 


Expressions, operators, and modifiers allow versatile 
assembly program control. Addition, subtraction, multi- 
plication, division, less than, greater than, equal to, and 
combinations thereof can be used to generate and 
modify variables. Other commands available include 
shifting, negation, modulo addressing, relative address- 
ing, and absolute addressing. 


HALE’s PROM formatter outputs in HILEVEL ASCIl, 
AMDASM, DATA I/O, and Intel Intellec Hex to adapt to 
your specific PROM programming needs. 


HALE allows the linking of relocatable code so that 
several software modules may be developed in parallel, 
allowing completion of the programming task sooner. 


Over 4000 source and definition symbols allow virtually 
unlimited amounts of code to be written. Word widths of 
up to 256 bits are supported aceormmnogating highly 
parallel architectures. 
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Use HALE to define instruction 


set and write applicable software \ 


Use PATCHWORK to 
correct errors and 
pass this information 
back to HALE 


Test system using 
DS3700 Emulyzer 


09372A 5-7 


Figure 5-7 


HALE runs on the IBM-PC/XT/AT, VAX, and Apollo 
_ computers. HALE runs all programs developed using 
AMDASM or Microtec Meta Assemblers, assuring the 
best possible return on your software investment. 


Software Tools for Firmware Integration 
and Debug 


Patchwork for fast effective microcode changes 


Patchwork is an interactive assembler that permits the 
user to write the patches in assembly mnemonics and 
immediately test them. Temporary patches can be easily 
made and removed based upon the date they were 
made. Patchwork records each change, comments, date 
and time. Each change that creates new object code is 
appended to the listing and source files. In addition, alog 
file maintains a complete record of the entire editing 
session. 


Alternatively, the user can utilize the object code editorin 
the DS3700 to make changes in the microcode residing 
in the WCS. In this mode, the WCS data is displayed in 
the same format as the HALE Macro-Meta Assembler 
object code listing. 


Single-Step for tough debugging problems 


The Single-Step program allows examination of the 
trace, source code, and comments together on a line by 
line basis. Each line shows what instruction was exe- 
cuted and what in fact happened. Using Single-Step, 
problems stand out and solutions often become appar- 
ent. Invoke patchwork, make the desired changes, and 
Single-Step again. For programmers writing code or 
maintaining it, the line by line comments allow quick 
recognition and interpretation of the instructions, thus 
reducing debug time. 
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Formatted Trace for full speed debugging 


Formatted trace helps find errors that occur during real 
time execution. After a full speed run Formatted Trace 
allows stepping through the trace buffer presenting 
source code and comments together. This allows fast 
identification of problem areas, and points to instructions 
causing problems. 


Trace Waveform for full logic analysis 


Trace Waveform conveys a visual historical record of 
target board operation at a glance. It allows converting all 
or any Combination of trace channels into timing dia- 
grams. Labels may be assigned to each trace channel for 
Clarity and recognition. A label file (containing the names 
of your traces) and a setup file (which holds parameters 
such as magnifications and scroll modes) can be cre- 
ated, saved and conveniently accessed in future uses. 
Cursor controls make comparison of non adjacent wave- 
form edges easy. Channel order may be permuted. 


Screen Driven 


The Hilevel Emulyzer provides screens for convenient 
system Set-up and operation. Each screen may be con- 
figured, saved and restored by the operator or by the 
Emulyzer Programming Language. The full range of 
Emulyzer operations are contained within the screens. 
For example, the writing of multilevel trigger programs, 
setting the logical analyzers breakpoints, running and 
tracing the microcode program, and analyzing microc- 
ode performance. Each screenis designed for maximum 
utility and optimum information display. 


Automated Emulyzer Operation 


EPL (Emulyzer Programming Language) automates the 
Emulyzer operation through the use of high-level com- 
mands. EPL permits the execution of command files that 
are used to setup the development environment (down- 
load the WCS, download mutilevel trigger programs, 
download display format, etc.) and later save it. This 
allows multiple users fast and easy access to the devel- 
opment system while managing their files safely. 


Microcode Quality Control 


Microcode Quality can be assured by repetitive testing. 
EPL provides commands that allow looping, uploading 
trace data and comparisons against known good files. 
Using EPL, extended tests can be used to catch illusive 
program bugs. 
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System Software 


Hilevel’s system software allows the user to customize 
his development system. Keys may be assigned to 
invoke any program including HALE, EPL, Patchwork, 
Single Step, Formatted Trace, and Waveform. Often- 
used keyboard routines may be defined as keyboard 
macros and are invoked with a single keystroke. 


In-Circuit Emulators 


HILEVEL In-Circuit Emulators are available for a variety 
of microcoded processors and support devices. Emula- 
tion is accomplished by placing the target device in a 
socket on the appropriate emulation pod and plugging 
the pod into the device socket in the system. The pod is 
controlled by the EC1000 controller, which can accom- 
modate up to four pods simultaneously. The EC1000 
features a built-in keyboard and LCD display to support 
stand-alone operation. 


The EC1000 may be connected to the DS3700 Develop- 
ment System, allowing the microprogrammer to control 
the Emulyzer and review data using the development 
system console. Using the EC1000 in concert with the 
development system also takes advantage of the 
DS3700’s multi-level triggering capabilities. 


DS3700 SERIES SPECIFICATIONS 


Writable Control Store (WCS) 
Depth: 1K to 64K; depending on ods. 


POD Types: Data, Address, Master 
P 


Allcontrol and display capabilities necessary for compre- 
hensive device emulation are designed into the EC1000: 
* Decimal, Hex, Octal, Binary, ASCII 
¢ Target single step or multiple step capability 
e Displays registers whose contents match speci- 
fied data 
¢ Allows changes to any part of any register 


¢ Allows control to be transferred to DS3700 or 
VT100 compatible terminal 


¢ EEPROM allows customization of default para- 
meters 


¢ External trigger allows external logic or test 
equipment to halt the Emulator 


Emulation pods currently offered by Hilevel for Advanced 
Micro Devices are the Am2910 sequencer, Am29116 
ALU, and Am29PL141 Fuse Programmable Controller. 


For additional information contact: 


Hilevel Technology, Inc. 
18902 Bardeen 

Irvine, CA. 92715 

(714) 752-5215 

TLX 655-316 


Logic State Analyzer (Trace) 
Number of Input Channels: 








memory configuration. 

Array Width: 0 to 512 bits in 16-bit 
increments. 

RAM Speed: 10 ns to 120 ns; 
depending on memory module 
selected. 

System Access Time: 25 ns to 140 
ns: depending on memory module and 
pod selected. 

Number of Independent Arrays: 16 
maximum. 

Target Control: Break (Halt), clear, 
single-step, continuous slow step, full 
speed emulation, break on event(s), 
PROM enable. 

Editing Modes: 

_ DS3700: Screen oriented editing with 
full search, scroll, page and window 
operation. 

DS3700/CS: Full Interactive Source 
Code Debug. 


WCS MEMORY MODULES: See 
following page. 
WCS INTERFACE PODS 


Logic Type: TTL, 10K ECL, or 100K 
ECL. 


Output Signals: 

Data Pods: 16 Data bits per pod. 
Master Pods: 16 Data bits, clock 
enable, target reset, 2925 run control. 
Address Pods: Clock enable, target 
reset, ROM enable, 2925 run control. 
Signal Inputs: 

Address Pods: 16 Address bits, clock 
input. 

Master Pods: 16 Address bits, clock 
input, PROM enable. 

Target Connection: Connector or 
PROM socket. 

Type of Memories Emulated: ROM, 
PROM, SRAM. 

Additional Support: 

Registered Memories: Yes, with 
initialization. 

Chip Select/Chip Enable: Up to 3. 
Pod Size: 

Data and Address Pods: 0.75" H x 
2.75" Wx 4" L 


Master Pods: 1.5" Hx 2.75"W x 4"L. 


DS3700 Mainframe. 0 to 80 channels 
in 16 channel increments. 

DT37XX Mainframes: 0 to 256 
channels in 16 channel increments. 
Maximum Clock Rate: 25 or 35 MHz; 
depending on type of trace memory 
selected. 


TRACE MEMORY MODULES: 


Model Depth Speed Width 
TRC/MLT-25 4K 25MHz 16 bits 
TRC/MLT-35 4K 35MHz 16 bits 


TRC16/MLT-25 16K 25 MHz 16 bits 


TRIGGER, BREAKPOINT AND 
TRACE CONTROL MODES 

Modes: External trigger, single event 
trigger. Unlimited breakArigger (UBE 
option) and Multi-level trigger/trace 
control. 

Mode Combinations: Any combina- 
tion except single event trigger and 
multi-level trigger, can be used 
simultaneously. 


(continued on following page) 
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DS3700 SERIES SPECIFICATIONS (continued) 


External Trigger: 

Input: BNC connector 

Level: TTL 

Active State: Negative going transition 
Single Event Trigger: Single level 
condition specified across entire 
address and data fields. 

Unlimited Break/Trigger: 
Description: Address field can be 
used to specify trigger/breakpoint 
events for simultaneous monitoring. 


Address Range: Option Range 
UBE-16 16K 
UBE-64 64K 


Type of Trigger: Any address or 
address range may be specified as a 
trigger, conditional trigger or arming 
word. 


Multi-Level Trigger/Trace Control: 


Number of Levels: 16 independent 
levels. 


Conditional Patterns: 4 per level 
across entire address and data fields. 


Condition Formats: Bit patterns with 
user defined format, and symbols (user 
defined or assembler generated). 


Boolean combination of symbols: 
Symbols may be combined with the 
following expressions: AND, OR, 
COMPLEMENT, NOT EQUAL 


Multiple Action Commands: Up to 9 
concurrent commands per condition 


_Action Commands: 13; as shown 
below. 
1. Trigger 
2. Conditional Trigger 
3. Arm Trigger 
4. Unarm Trigger 
5. Reset Trigger 
6. Disable Trace 
7. Enable Trace 
8. Override Trace Disable 
9. Disable Trace Mask 
10. Zero Timer 
11. Jump to level <N> 
12. Initialize loop/event counter 
13. Assert Pattern Generator 
Conditional Control 


Loop/Event Counter: Up to 65,535 
events 


Trigger Delay: 0 to 4095 clock cycles 


Breakpoints: Independent on/off 
control 


TRACE MODES 
Modes: State analysis; State timing, 
absolute elapsed time; State timing, 
Interval; Performance analysis and 
Dynamic performance graphing 
State Timing (absolute and interval): 
Resolution: 15 ns or 250 ns, selectable 
Maximum Time: 

Low Resolution: 16 minutes 

High Resolution: 1 minute 

Using Trace control: >16 hours 
Performance Analysis (TIM-1E and 
UBE options): 
Number of Groups: 15 
Group definition: Any subset of the 
address range. 


Address Range: Option Range 
UBE-16 16K 
UBE-64 64K 
Operation: Logic analyzer stores 
group transitions. 


Display: Both histogram and absolute 
time chart. 


Histogram: Relative % of execution 
time used by each defined group. 


Absolute: Total execution time of 
each group. 


Group Name: Up to 15 characters. 


Time Resolution: 15 ns or 250 ns, 
selectable. 


Dynamic Performance Graphing 
Number of Groups: 15 


Group definition: Any subset of the 
address space. 

Address Range: 64K 

Operation: Logic analyzer dynamically 
updates trace memory and displays 
graph of percentage of events within 
each group. 

Display: Histogram 


SYMBOLIC TRACE 


Description: Symbols may be defined ° 


using entire address and data fields. 
Display: symbols will be displayed 
along with user formatted data. 

Use: symbols may be used for trace 
display, trace control/rigger condition 
statements, search/locate operations, 
and time interval measurements. 


Source: Symbols may be defined 
using DS3700 menu or downloaded 


_ from HALE definition files. 


Maximum Characters per Symbol: 
15 | 
Maximum Number of Symbols: 
Depends on number of characters per 
symbol and width of data fields. >1000 
symbols with average of 7 characters 
when defined on address field 


TRACE MASK (UBE OPTION) 
Description: Unconditionally masks 
from trace any user specified address 
or range of addresses. 

Maximum Mask: Any subset of 
address range. 


Address Range: Option Range 
UBE-16 16K 
UBE-64 64K 
TRACE PODS 


Logic Type: TTL, 10K ECL, or 100K 
ECL. 


Signal Inputs: 16 data bits, clock. 


Display Formatting 

DS3700: Any user selected combina- 
tion of hexadecimal, binary, and/or 
octal. | 


DS3700/CS: Full interactive WCS and 
Trace Disassembly. 


Multiple Formats: Any 4 bits of each 
array and trace may be used to select 
between 16 user specified formats. 


User Defined Headings: 
Maximum number of characters: 256 


Multiple headings: Up to 16 to match 
multiple formats. 


Display Permutation: Any bit may be 
displayed in any position within WCS 
and Trace displays. 


DS3700 Mainframe 
WCS Size: Accepts up to 8 WCS 
memory modules (128 bits). 


Number of Arrays: One 


Trace size: Accepts up to 5 trace 
memory modules (80 channels). 


(continued on following page) 
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DS3700 SERIES SPECIFICATIONS (continued) 


interfaces: 

RS232: 3 ports 

High Speed Parallel: 1 port 

GPIB (IEEE-Std-488): 1 port (Op- 
tional) 

BNC Inputs: External clock, external 
trigger 

BNC Outputs: Arm output, trigger 
output. 

Annunciation: Front panel LEDs show 
status of trigger, GPIB interface, 
clocks, and operational controls. 


DT37XX Mainframe 
WCS Size: None, requires EXP3700 
for WCS operation. 


Trace Size: Accepts up to 16 trace 
memory modules (256 channels). 


WCS MEMORY MODULES 


Model Depth 


1K 4K 16K 


E1K-10 
M1K-20* 
M1K-35* 
E4kK-10 
E4KW-10 
E4K-25 
E4KW-25 
M4K-25* 
M4K-35" 
M4K-120* 
E16K-25 
E16KW-25 
M16K-35* 
M16K-70* 
M16K-120" 


«x KX 


MK MK KK KK OX 


x KK KX 


Interfaces: 

RS232: 3 ports 

High Speed Parallel: 1 port 

GPIB (IEEE-Std-488): 1 port (Op- 
tional) 

BNC Inputs: External clock, external 
trigger 

BNC Outputs: Arm output, trigger 
output. 


Annunciation: Front panel LEDs show 
status of trigger, GPIB interface, 
clocks, and operational controls. 


EXP3700 Expansion Chassis 
WCS Size: Accepts up to 16 WCS 
memory modules (256 bits). 


Number of Arrays: May be config- 
ured as one or two arrays. 


RAM Speed 
(ns) 


Emulation 
PROM RAM 


10 

20 

35 

10 

X 10 
25 

X 25 
25 

35 

120 

25 

X 25 
X 35 
X 70 
X 120 


OK KK OK OK OK OK OK OK OK OK OK OK OX 


*M Series memory modules requires EXP370-4 expansion chassis. 


**Access times specified at target side of pod. 


25 


Operating Specifications 
(DS3700, DT37XX, EXP3700 chassis) 


Chassis Size: 7" H x 18" W x 23" D 
Weight: 60 to 70 lbs depending on 
options included. 

Operating Temperature: 15°C to 
35°C 

Operating Humidity: 10 to 80 % RH 
Power Requirements: 90 to 

132 VAC, or 180 to 250 VAC; 50 or 
60 Hz. 

Warranty: 1 year limited warranty. 
For additional information contact: 


Hilevel Technology, Inc. 
18902 Bardeen 

Irvine, CA. 92715 
(714) 752-5215 

TLX 655-316 


System Speed (ns)** 


35 40 50 90 140 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
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5.4.4 Hewett-Packard 
Microprogram Development Support 


HP 64276 Microprogram Development Subsystem 


Description 


The HP 64276 Microprogram Development Subsystem 
and the HP 64320S 25 MHz Logic State/Software Ana- 
lyzer provide run control and real-time analysis for the 
AMD Am29300 family. As integrated subsystems of the 
HP 64000 Logic Development System, the HP 64276 
and the HP 64320S add the power of run control and 
analysis to all phases of the design, development, and 
maintenance of Am29300-based products. 


The Microprogram Development Subsystem consists of 
three components: a Run Control module, a Writable 
Control Store (WCS), and a 25 MHz Logic State/Soft- 
ware Analyzer. Run Control provides program flow con- 
trol, clock control, and break event detection. Writable 
Control Store provides high speed RAM for storing the 
microcode to be executed. A 25 MHz Logic State/Soft- 
ware Analyzer monitors systems buses and provides 
trigger, store, and sequencing functions for locating 
problems in the microprogram. Integration of the Micro- 
program Development Subsystem with other powerful 
HP 64000 analysis and emulation tools allow for interac- 
tive, cross-triggered measurements in complex multi- 
processor environments. 


Features 


The choice of clock control or real-time address 
jam at break detection offers flexible target 


system control. 


Address ranging and two-level sequencing 
provide powerful break event specification. 


Real-time, nonintrusive analysis of micropro- 
grammed system activity reduces software devel- 


opment time. 


* Flexible user-definable microassembler provides 
support for a wide variety of Am29300-based 


designs. 


Microcode source interleaved with analyzer trace 
data speeds software debugging. 


Linking of separately assembled microcode 
modules accelerates software turnaround time. 


¢ MACRO instruction feature of the microassem- 
bler improves software engineering productivity. 


Modular architecture permits specific Writable 
Control Store configurations for customized 


development tool needs. 


¢ Integration of Run Control and analysis capabili- 
ties simplifies operation. 


* Interaction with other HP 64000 System Emula- 
tors and analyzers provides real-time analysis in 


multiprocessor environments. 


Run Control 


Run control provides system clock control, break 
event s pecification, and address jamming. These im- 
portant features improve Poreene of Am29300- 


based systems. 


Architecture 
The Run Control module taps into the clock lines on the 


target system to obtain the greatest level of clock control. 


Clock control functions allow you to start and stop the 
clock, single step, one break on a specific clock edge or 
pattern. 


The Run Control module provides 20 I/O lines to probe 
the address bus, monitor status bits, or drive control 
lines. These I/O lines are bused internally to the Writable 
Control Store and the state analysis data probe connec- 
tors on the Run Control module. 


Both single lead or coaxial cable leads are supplied for 
probing the clock and control lines between the target 
system and the Run Control module. Coaxial leads are 
recommended for use with higher clock rates to ensure 
better signal quality. 


Clock Control 


Precise specification of clock edges and relationships is 
critical for breaking or halting the clock in target systems 
with multiple clock signals. The Run Control Module 
allows you to specify complex clock signal characteris- 
tics for use in break events. 


Address Jamming 


Address jamming forces program execution at a specific 
address if a starting point other than a system reset 
vector location is desired. For example, to force the 
execution of a monitor routine that displays the registers, 
an address is jammed onto the address bus, causing the 
program to jump to the monitor routine. With the HP 
64276 Microprogram Development Subsystem, you can 
jam either 8, 12, 16, or 20 address lines. 


Break Events 


The HP 64276 allows you to initiate a break event after 
the detection of any of the following occurrences: an 
address pattern (up to four can be specified), an address 
range, or a two-term sequence of an address pattern, 
range, orboth. The state analysis trigger also can enable 
break event detection. When a break event occurs, an 
address can be jammed onto the address bus (e.g., toa 
monitor program) or the system clock can be stopped. 
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Writable Control Store 


The Writable Control Store (WCS), the memory array for 
the system microcode, consists of a dual port RAM that 
allows easy microcode downloading from the assembly 
environment and high-speed access of the microcode by 
the microprogram target system. Target system develop- 
ment and debugging is more efficient using the WCS 
instead of the target system control store. 


Architecture 


The Writable Control Store (WCS) contains either one or 
two 32 kbyte memory boards. Each board can be config- 
ured into one of three array sizes: (bits wide by words 
deep) 16 by 16K, 32 by 8K, or 64 by 4K. With two WCS 
boards in the subsystem, the microword widths are 
doubled. 


The WCS address is obtained from the Run Control 
module, eliminating the need to probe the target system 
a second time. By using one of the WCS address lines as 
an enable control to three-state the WCS output, you can 
toggle between target memory and subsystem memory. 


Load 


Once microcode has been assembled and linked, it is 
downloaded from the software development environ- 
ment to the Writable Control Store for execution. Trans- 
ferring microcode is fast and easy with the integrated 
development and hardware execution environments of 
the Microprogram Development Subsystem. 


List 


When debugging microcode, you can examine the con- 
tents of the WCS and list them to a destination file, a 
printer, or a display. A single list command specifies from 
one to four addresses or groups of contiguous WCS 
addresses. Displaying the address ranges allows you to 
examine and compare the microcode in different subrou- 
tines. 


Modify 


While debugging, you can modify the absolute code and 
continue debugging. Modify can be specified for up to 32 
bits at a time for either a single WCS address or a range 
of addresses. 


Save 


The absolute code storedin WCS can be saved to a disc 
file for later reloading or for verifying the correctness of 
changes to source microcode. 


User-defined 


You can design a custom WCS array and combine it with 
the other modules of the Microprogram Development 


Subsystem. The combination of the HP 64000 Logic 
Development System, the HP 64276 Run Control, and 
the user-defined WCS array provides an integrated 
development solution for all Am29300 microprogram 
target systems. 


The user-defined WCS interface supports any array size 
between 16 by 512K and 1024 by 8K (bits wide by words 
deep). The interface between the HP 64000 mainframe 
and the user-definable WCS consists of control lines and 
parallel address and data buses that allow data to be 
written to or read from the WCS. User-definable control 
sequences can be transmitted to the user’s WCS preced- 
ing and following an upload or download operation. 


25 MHz Logic State/Software Analyzer 


The HP 64320S 25 MHz Logic State/Software Analyzer 
adds high-speed, real-time, nonintrusive software analy- 
sis to the HP 64000 Logic Development System. This 
flexible analyzer works well in microprogram software 
analysis, general-purpose software analysis, and sys- 
tem integration. Measurement results are displayed in 
source microcode (including MACROs and comment 
lines) or in user-defined symbols that minimize the need 
to decode captured data. The analyzer can also refer- 
ence symbols from the microprogram source files for 
easy specification and interpretation. 


Architecture 


The analyzer can be configured for 30, 60, or90 channels 
of data acquisition. Each configuration must have a 
control card and from one to three data acquisition cards 
containing 30 data acquisition channels. The following 
table contains the analyzer’s configurations. 


Number of Input Control 30-Channe! 
Channels Cards Card 
30 4 1 
60 1 2 
90 1 3 


Format Specification 


The Format Specification establishes the conditions and 
relationships of target system signals transmitted to the 
analyzer through the clock and data input channels. 
User-defined labels up to fifteen characters long can be 
assigned to signal groups from one to 32 contiguous 
channels wide. Saving the Format Specification to the 
disc eliminates respecifying data channel labels, thresh- 
old levels, and clock characteristics each time the ana- 
lyzer is used. After a label is assigned to a group of input 
channels, it also appears on the analyzer softkeys. 
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To avoid confusion caused when both positive and 
negative true data are present in the system under test, 
the 25 MHz analyzer can automatically complement any 
group of data channels. You do not need to invert these 
signals on the target system or complement data as 
measurements are specified and results are interpreted. 


The analyzer has two separate clock inputs. Data can be 
captured on the positive and negative edges of both 
clocks. With two clocks, you can analyze systems with 
multiple CPUs by capturing data on each processor's 
address strobe signal. 


Data and clock signal switching threshold voltages can 
also be varied. Appropriate thresholds for TTL and ECL 
logic families have been preprogrammed. You can also 
select other values between -10 and +10 volts, in 100 mV 
increments for monitoring several different logic families. 
Independent threshold specifications can be made for 
each acquisition board (30 data channels). 


Map Specifications 


The Map Specification greatly simplifies measurement 
setups and trace data interpretation by replacing raw 
captured data with user-defined symbols. A “symbol 
map” can be associated with any labeled input channel 
via the Format Specification. Entries in a symbol map 
appear as part of the analyzer’s softkey syntax and in the 
displays of measurement results. Map symbols are de- 
fined as constants, patterns, or ranges. A map symbol 
can be defined in terms of source file line numbers or 
user-symbols from microprogram source files. 


Trace Specification 


The Trigger function determines when the analyzer will 
capture data. Complex triggering conditions can be 
implemented using sequence terms. A “term” is defined 
as “AND’ed” constants and patterns. A constant can be 
an integer, map symbol, or symbol from the micropro- 
gram source file. A pattern is an integer with embedded 
“don’t cares” (e.g., 0100xxxxB). Four sequence terms 
(trigger being the fourth) are available. Each sequence 
termcan be set up to occur from 1 to 65,536 times before 
it is satisfied. A restart term is also available for resetting 
the sequencer. 


The Trigger Enable function specifies when the analyzer 
monitors data for a trigger event. The trigger event can be 
stored anywhere within the trace memory buffer, allow- 
ing trace data to be stored either preceding, surrounding, 
or following the trigger event. The Store function 
determines what data should be stored. You can specify 
up to four OR’ed terms with each term consisting of 


AND’ed constants and patterns. When the restart termis 
used for sequencing, the maximum number of OR’ed 
terms is three. The optional store with “sequence protect” 


specifies that the sequence events be saved before any 
pre-trigger events are stored. 


Measurement Results 


The HP 64320S 25 MHz Logic State/Software Analyzer 
provides a high degree of display flexibility. When using 
source display, the microcode is visible without having to 
probe the microword: microword fields, MACRO invoca- 
tions, andcomments from source files are displayed. The 
display shows these source level statements combined 
with target data probed by the analyzer. This combination 
of program and data makes microcode debug more 
productive and efficient. Displays can also include user- 
defined symbols specified in the symbol maps and can 
automatically reference microassembler symbol tables 
generated during software development. These symbols 
can be displayed in the trace listings. 


Flexible Probing Capability 


The HP 64320S analyzer’s clock cable and two of its data 
probes plug directly into the HP 64276 Microprogram 
Development Subsystem to eliminate double probing of 
the Am29300-based target system. Run Control, WCS, 
and the other state analysis data probes connect to the 
target system by general-purpose wire grabbers or D- 
type coaxial cables. The coaxial cables offer better high- 
frequency signal quality and a more reliable connection 
to the target system. 


Measurement Invoiving Multiple Analyzers 


Measurements with the HP 64320S and other HP 64000 
analysis subsystems relate microcode execution to other 
software and hardware events. These interactive meas- 
urements are conducted via the high-speed intermodule 
bus (IMB). The IMB carries the following five signals 
between the analysis subsystems: 


Received by Driven by 

IMB Signal HP 64320S HP 64320S 
Master Enable yes yes 
Trigger Enable yes yes 
Trigger yes yes 
Storage Enable yes no 
Delay Clock no yes 
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The Master Enable signal coordinates measurement 
starts with other analyzer and emulators. When the 
analyzer is set up to receive this signal and the Master 
Enable is “false,” the analyzer is completely disabled and 
will not capture data. When Master Enable becomes 
“true,” the analyzer begins examining data. 


The Trigger Enable operates in the same way as Master 
Enable by informing the receiving analysis module when 
it can begin looking for its trigger condition. 


The Trigger signal, when received, causes the analyzer 
to immediately trigger and complete its measurement. 
For example, this is valuable for using the HP 64610S 
high-speed Timing/State Analyzer in conjunction with the 
25 MHz Logic State/Software Analyzer to determine if a 
spurious signal pulse is related to a microcode event. By 
triggering the 25 MHz analyzer on a hardware event, the 
microcode execution surrounding the pulse is quickly 
pinpointed and evaluated. 


The Storage Enable signal exercises hierarchical control 
over the store specification. 


Microassembler 


The HP 64276 Microprogram Development Subsystem 
includes a user-definable microassembler and linker 
capable of generating microwords up to 128 bits in width 
which support Am29300 family devices. The linker al- 
lows assembly of separate modules, reducing turn- 
around time for source microcode changes. 


The definition language operates on a 32 bit, 40 register 
pseudo machine with standard instructions forthe move- 
ment and manipulation of data. In addition, higher level 
commands for standard tasks are also provided (i.e., 
commands such as GET_TOKEN, FIND DELIMITER, 
and GET_OPCODE support lexical analysis). The user- 
definable microassembler can also generate relocatable 
code with the use of the GEN CODE command. The 
ERROR and WARNING commands print messages 
from a fixed table to the listing file to simplify error 
detection and correction. Field names and their values 
are easily specified (e.g., SEQ = CONT). 


The definition language is powerful enough to allow the 
creation of a customized microassembler capable of: 


* Generating code 
* Specifying default values for missing fields 


e Issuing errors for missing fields not having a 
default value 


¢ Issuing errors for overlapping field definitions 

* Issuing errors and warnings for architectural 
inconsistencies, such as a microinstruction that 
could cause bus contention 


The resulting customized microassembler recognizes 
the syntax specified in the definition stage. Standard 
Capabilities are predefined for the microassembler and 
need not be explicitly specified in the definition stage. 
For example, standard pseudo-ops are provided for 
storage allocation, location counter control, and listing 
format control. In addition, a powerful MACRO facility 
is supported. 
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5.5 SIMULATION MODELS 


Logic Automation, Inc. 
Simulation Models for Hardware and 
Software Verification 


The freedom and flexibility that have always been the 
benefits of designing with microprogrammed devices are 
now supported by a new generation of computer-aided 
design tools. 


Advanced Micro Devices, Inc. and Logic Automation 
Incorporated have entered into a Library Development 
Relationship. This agreement has made it possible to 
model many of the latest AMD devices and make them 
available to designers. Table 5-2 includes all 
theAm29300 family. 


Many other Advanced Micro Devices models are also 
available from Logic Automation; the entire AMD model 
list appears at the end of this section. These simulation 
models have been developed by Logic Automation with 
the cooperation of Advanced Micro Devices. Each model 
is based on information provided by AMD and verified 








with the same vectors that are used to test the actual part. 
Each model is a SmartModel, capable of performing 
usage and timing checks that will significantly improve 
your ability to debug, verify, and optimize your designs. 


SmartModel Simulation Benefits 


Simulation models from Logic Automation are called 
SmartModels because they are behavioral language 
models with built-in intelligence. This concept—that in- 
formation about VLSI devices is most effective when itis 
available inside the models used to simulate complex 
systems—was introduced and pioneered by Logic Auto- 
mation. SmartModels allow you to use a workstation and 
logic simulator to verify your designs at the systems level. 


Design cycles are shorter because the simulations catch 
many errors—both subtle and obvious—before the first 
prototype is built. Cycles are shortened because Smart- 
Model simulations are fast. They are easy to use andthey 
are designed to maximize the effects of your simulation 
runs. Simulation runs are also critical as the first step in 
developing test vectors that must be used later to verify 
production systems. | 


Table 5-2 
Description TTL CMOS ECL 
32-Bit Integer Multiplier Am29C0323 
Floating Point Processor Am29325 Am29C325 
16-Bit Sequencer Am29331 Am29C331 
32-Bit ALU Am29332 Am29C332 
Register File Am29334 Am29C334 Am29434 
Bounds Checker Am29337 
Byte Queue Am29338 
High Level 
‘| Programming 








Microcode 
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Figure 5-8. Microprogrammed Product Development Cycle (without simulating) 
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Figure 5-9. Microprogrammed Product Development Cycle (with simulating) 


SmartModel Simulations Postpone Prototyping 


Without simulating, the microprogrammed product de- 
velopment requires hardware prototype development 
very early inthe process. As shown by the shading inthe 
diagram’s process blocks, Figure 5-8, only the overall 
design and hardware design (plus schematic capture) 
can be completed without breadboarding. Contrast 
this situation with the same process diagrammed in 
Figure 5-9. 


Simulating permits far more of the product development 
cycle to take place before the first hardware prototypes 
are necessary. First of all, the simulation takes the place 
of the breadboarded hardware that would have been 
necessary for integration. In addition, short sections of 
code generated in a high level language using existing 
software development tools can also be executed in the 
simulation environment to help in the initial phase of 
system verification. 


SmartModel Simulations Are Fast 


Simulations with behavioral language models run fast. 
The demonstration circuit used below is a simple graph- 
ics processor designed using AMD’s new 32-bit building 
block Am29300 family: the Am29331 sequencer, 
Am29332 ALU, Am29325 floating point processor, and 
two Am29334 dual-port register files. There are a total of 
39 ICs in the schematic including 4 Am29827 10-bit 
buffers, 12 Am29841 10-bit latches, and 8 Am27S35 
registered PROMs. In addition the design contains an 
abstracted behavioral language model of a display 
memory that is equivalent to eight SRAMs. 


Figure 5-10 is a screen print of a simulation running 
under Mentor Graphics QuickSim 5.1. A timing diagram 
in a trace window occupies the width of the screen at the 
top. The QuickSim menu window is below left; next is 


a list window showing a few of the circuit lines against 
simulation time. In the lower lefthand corner, there is a 
transcript window containing messages written by one of 
the Smart Models in the circuit. The lower righthand 
corner of the screen shows the schematic. 


The Circuit executes microcode out of ROM to plot the 
pixels that make up a line ona display. The pseudo-code 
for the line-plotting algorithm is below. 


x, y, Geltax, deltay <- FIFO (1,2,3,4) 
e <- 2 * deltay - deltax 
for i= 1 to deltax do begin 


plot (x,y) {XOR in pixel(x,y)into bitmap} 


if e > 0 then begin 
VY Sey ed 
e<- e+ (2 * deltay - 2 * deltax) 
end 
else 
e<- et 2 * deltay 
x <- x +1 
end for 


Runon an Apollo DN000 with Mentor Graphics QuickSim 
Version 5.1, the circuit ran through that algorithm execut- 
ing the equivalent microcode at a rate of 34 microcode 
instructions per minute at a 1 ns resolution. Note that this 
was an exercise of the entire design, a true system-level 
benchmark. 


SmartModels Are Easy To Use 


SmartModel simulations are effective because these 
models are designed to make the most of every simula- 
tion run. For example, some users of simulation tech- 
niques have noted that analyzing computer printouts of 
logic values is tedious and very time-consuming. Using 
SmartModels eliminates that problem. During the initial 
stages the models’ functional checks pinpoint usage 
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errors. Later in the design process, the timing checks are 
usually more pertinent. In both cases, the models use 


messages on the workstation screen to pinpoint the. 


exact problem by time and schematic instance. This 
unique feature of simulation models from Logic Automa- 
tion is called Symbolic Hardware Debugging. 


Symbolic Hardware Debugging is a series of checks 
which write error or warning messages in the transcript 
window during your simulation runs. There are two types: 
functional checks and timing checks. The function 
checks vary greatly with the device type, but essentially 


they help make sure a chip is being used correctly. For 
example, a DMA controller will include a check on 
whether or not all internal modes and registers were 
initialized. A DRAM check will produce a message like: 
“WE was low at the RAS falling edge.” 


The timing checks can include set-up, hold, frequency, 
pulse width, recovery time, etc., as applicable to the 
component and as specified by the semiconductor 


vendor’s current data sheet. A 1 megabit x 1 DRAM 
model, for example, contains about 50 different timing 
checks. 


Both kinds of checks produce Symbolic Hardware De- 
bugging messages that are very specific. A setup time 
violation, for example, will cause an error message that 
documents: pin name; device, by instance, reference 
designator, and component name; sheet name; design 
name; simulation time; signals and edges, as appropri- 
ate; and setup times, both as they occurred and as 
required by the vendor’s data sheet. 


Symbolic Hardware Debugging means your simulation 
runs give you answers, not just binary data which you 
have to painstakingly decode and compare to the IC data 
books. 


Messages like that during your simulation runs speed 
your design debugging and verification. In this case, a 
check for an illegal operation has been built into the 
model; the operation can occurif the first instruction in an 
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Figure 5-11. Symbolic Hardware Debugging in the AMD 
32-Bit Building Block Family SmartModels 


interrupt service routine is a stack operation. Besides a 
service routine that starts with a stack operation, this 
error message might be caused by an incorrect interrupt 
vector that caused a jump to any location that contained 
a stack operation. Similarly, the Am29334 SmartModel 
will signal if the write address changes during a write 
cycle; the model will issue a warning and write the data 
to all the locations involved so that the simulation run can 
continue. Many other function checks are built into these 
models. For the Am29300 family SmartModels, there are 
setup and hold timing checks for each input pin except 
the clock. For the clock, there are pulse width and 
frequency checks built into the models. Pulse width 
checks for the Write Enable and Data Latch Enable pins 
are also written into the Am29334 model. 


SmartModels Make Your Simulations 
More Efficient 


SmartModels maximize your simulations because they 
are adept at handling X’s (unknowns). Depending on 
. where it occurs in the circuit, one unknown can spread 
X’s throughout your simulation. When that happens, your 
runis less useful than it could be because later events are 
buried in X’s. To gain more information, you fix the first 
problem and rerun the simulation. SmartModels are 
designed not to generate or propagate X’s unnecessar- 
ily—with Symbolic Hardware Debugging, the use of X’s 
can be very judicious. Our engineers anticipate when an 
“X’ is truly a “don’t care” and keep your simulations useful 


as long as possible while always issuing a warning 
message to document the event. 


SmartModels Are Accurate 


The Logic Automation and Advanced Micro Devices 
Library Development Relationship means that AMD 
supplies our model builders with advance information 
and with the test vectors used for the actual chips. We 
use the test vectors to certify that the SmartModels are 
accurate simulations of the AMD components. 


SmartModels Represent Good Values 


Multiple Timing Versions 


Every SmartModel includes the correct timing for all 
available speed versions. An example is the Am29C323; 
the SmartModel for that part contains the Am29C323, 
Am29C323-1, and Am29C323-2 timing versions. 


Maintenance 


A maintenance agreement will keep your models 
current automatically. When CAE companies update 
their simulators and workstation operating systems, your 
models will be updated. Because Logic Automation 
works with the CAE companies prior to the new software 
release, you will generally have new SmartModels in 
your hands before you’re ready to upgrade your system. 
If you have a maintenance agreement, Logic Automation 
will also automatically update your SmartModels when 
the manufacturer changes specifications or adds new 
timing versions. 


Documentation and Support 


SmartModels are very easy to install and use. Full 
documentation is provided with each set ordered. In- 
cluded are: installation instructions; SmartModel Library 
Users Guide; data sheets on each model; and relevant 
application notes. In addition, our Applications Engineers 
are ready to help you with any questions at 503-690- 
6900. 


SmartModels Are Available For Designs Now 


Logic Automation has more than 250 timing versions of 
about 100 Advanced Micro Devices components that run 
on popular CAE workstations available now. 


EPROMs 


Am27128A 16Kx8 

Am27LS191, included with Am27S191 
Am27PS$191/A, included with Am27S191 
Am27S191/A/SA 2Kx8 
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PROMs Bit-Slice Family 
Am27S19/A 32x8 Am2901B/C 4-bit slice 
Am27$825 512x8 Am2902A carry/look-ahead 
Am27S291A 2Kx8 Am2903A 4-bit slice 


Am27S35/A 1Kx8 
Am27$37/A 1Kx8 
Am27S45/A 2Kx8 
Am27S47/A 2Kx8 


Static RAMs 


Am2130 1Kx8, dual port 
Am2168 4Kx4 

~Am2169, included with Am2168 
Am27519 64Kx1 

Am9114 1Kx4 

Am9124, included with Am9114 
Am9128 2Kx8 

Am9150 1Kx4 

Am9151 1Kx4 

Am91L14, included with Am9114 
Am9$1L24, included with Am9114 
Am93L422 256x4 


Support 


Am29114 real-time interrupt controller 
Am2914 interrupt controller 

Am2952 8-bit bidirectional I/O port 
Am2953/A 8-bit bidirectional 1/O port 
Am2965 ocial driver 

Am2966 octal driver 

Am8237A DMA controller 

Am9513A system controller 
Am9517A DMA controller 

AmZ8073 system controller 
AmZ8530 serial controller 


32-Bit Building Blocks 


Am29C323 32-bit multiplier 
Am29325 floating point processor 
Am29C325 floating point processor 
Am29C331 16-bit sequencer 
Am29331 16-bit sequencer 
Am29332 32-bit ALU 

Am29C332 32-bit ALU 

Am29334 register file 

Am29434 register file 

Am29337 bounds register 
Am29338 byte queue 
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Am2909 microprogram sequencer 
Am2910/A microprogram controller 
Am29116/A 16-bit microcontroller 
Am2911A microprogram sequencer 
Am2940 DMA address generator 
Am2942 timer/counter/DMA address generator 
Am29520 pipeline register 

Am29521 pipeline register 

Am2960 error detection and correction 
Am29C10 microprogram controller 
Am29L116, included with Am29116/A 


Multipliers & ALUS 


Am25S$557 8-bit multiplier 

Am25$558 8-bit multiplier 

Am29C323, see 32-bit building blocks category 
Am29332, see 32-bit building blocks category 
Am29516 16-bit multiplier 

Am29517 16-bit multiplier 

Am29L516 16-bit multiplier 

Am29L517 16-bit multiplier 


Programmable Logic Devices 


AmPAL18P8 PAL 
AmPAL22V10/A PAL 


~Am29PL141 fuse programmable controller 


Am29800 Family 


Am29806 6-bit chip select decoder 

Am29809 9-bit equal-to comparator 

Am29818 shadow register/WCS pipeline register 
Am29821/A/Am29C821 10-bit register 
Am29822/A 10-bit register (inverting) 
Am29823/A/AmM29C823 9-bit register 
Am29824/A 9-bit register (inverting) 

Am29825/A 8-bit register 

Am29826/A 8-bit register (inverting) 
Am29827/A/Am29C827 10-bit bus buffer 
Am29828/A/Am29C828 10-bit bus buffer (inverting) 
Am29833/A/Am29C833 parity bus transceiver 
Am29834/A/Am29C834 parity bus transceiver 
(invert register) 

Am29841/A/Am29C841 10-bit bus interface latch 
Am29842/A 10-bit latch (inverting) | 





-Am29800 Family (continued) 


Am29843/A/Am20C843 9-bit latch 
Am29844/A 9-bit latch (inverting) 

Am29845/A 8-bit latch 

Am29846/A 8-bit latch (inverting) 
Am29853/A/Am29C853 parity bus transceiver 
(noninverting latch) 

Am29854/A/Am29C854 parity bus transceiver 
(inverting latch) 

Am29861/A/Am29C861 10-bit transceiver 
Am29862/A 10-bit transceiver (inverting) 
Am29863/A/Am29C863 9-bit transceiver 
Am29864/A 9-bit transceiver (inverting) 
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Models are added every week, So call to get the latest 
catalog or price and delivery information: 


Logic Automation Incorporated 

P. O. Box 310 

Beaverton, OR 97075 

Tel: (503)690-6900. Fax: (503)690-6906. 


East Coast sales office: 


Park View Office Building, Suite 400 
10480 Little Patuxent Parkway 


Columbia, MD 21044-3502 
Tel: (301)740-8704. 
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5.6 C COMPILER SUPPORT Types 
Introduction All Microcode C compilers support a common data type 


With the advent of the Am29300 Family, it has become 
relatively easy to design bit slice systems controlled by 
very large amounts of microcode. 


When it is expected that a fair amount of application 
microcode must be written, when speed of application 
development is important, or when some measure of 
portability is desired, then a microcode compiler can be 
an invaluable, if not essential, tool. 


In this section, we discuss compiler implementations 


from two different angles. To begin with, we will discuss | 


some of the decisions to be made when implementing a 
compiler for a specific architecture. Then we will discuss 
what hardware features are desirable to support the im- 
plementation of a compiler. 


Before going any further, we should note that we do not 
believe that a microcode compiler can by itself provide a 
complete solution to the problem of writing code for bit 
slice systems. If you want to implement a general pur- 
pose language, you must design a general purpose 
processor. If you have not designed a general purpose 
processor, then it may be pointless to try to implementa 
compiler for your hardware. Even if your hardware is an 
ideal target for a compiler, there will inevitably be a need 
to code some small portion, at least, in assembler. In 
short, a microcode compiler is a tool, but not a panacea. 


The Microcode C Compiler 


The language we use is called Microcode C. It is similar 
enough to the C language that a programmer who 
already knows C can start programming in Microcode C 
after as little as one day’s study. 


The Microcode C compiler must be customized, which 
basically means that we have to write a code genera- 
tor for your hardware, after making certain design deci- 
sions based on your needs and the capabilities of your 
hardware. 


The compiler generates micro-assembler code as its 
output. If you already have a microcode assembler, then 
we Can arrange to generate the mnemonics used by your 
assembler. Otherwise, we can generate code for Bit Slice 
Software’s standard microcode assembler. 


To date, we have developed about 12 different Microc- 
ode C compilers. These have variously been installed 
under PC-DOS, VMS, and/or Unix. 


- the signed integer whose width corresponds to the width 
of the processor. Typically, the width is 16 or 32 bits. 
Usually the types short and long are treated the same as 
int. Structures, unions, and arrays are supported, but 
sometimes with restrictions. 


Other types are supported if desired and if the hardware 
permits. The type char can be reasonably supported if 
the basic memory architecture allows byte addressing. 
Since most microarchitectures use word oriented ad- 
dressing, char is most often simply treated as int. The 
type unsigned can be supported if condition codes for 
unsigned comparisons are efficiently implemented. The 
types float and double are usually implemented only if 
there is floating point hardware to support them. How- 
ever, they can also be implemented if software floating 
point routines are written. 


Storage class 


All Microcode C implementations support the storage 
class static. The auto storage class is only supported if 
the hardware allows a reasonable implementation of a 
run time stack. If itis not possible to support a stack, then 
local variables (which are normally allocated on a stack) 
are treated as static and recursive calls are not allowed. 
The extern storage class is supported if the assembler 
for which the compiler is generating code supports exter- 
nal references and definitions. 


Most micro-programmers lay great stress on maximizing 
theiruse of the machine registers. Microcode C supports 
their desires by allowing them to declare variables with 
register storage class. Microcode C allows registers to 
be declared globally, as well as locally. Local register 
variables must be saved when a function call is made. 
Global registers never need to be saved or restored. 
They can be used to pass data between procedures in 
registers. 


Initialization 


The standard C syntax for static initialization of variables 
is supported. 


Expressions 


Each implementation supports all the standard C opera- 
tions defined for its supported types. Binary operations 
supported include integer addition, integer subtraction, 
logical left and right shifts, bitwise and, bitwise or, bitwise 
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exclusive or, logical and, and logical or. Unary operations 
include take address, indirect through address, one’s 
complement, logical negation, integer negation, and pre- 
and post- increment and decrement. Integer multiplica- 
tion, division, and remainder are supported when the 
micro-architecture encourages them. 


Statements 


All of the standard C statement types are supported, 
including for, while, do, go to, switch, if, else, break, 
continue, case, and default. The switch statement will 
generate a jump table if the micro-architecture permits. 
The compiler also supports a switchf statement, which 
is like a Switch except that it does not do a bounds check 
on the switch value before passing it through the jump 
table. Use of switchf instead of switch can save four or 
five micro-instructions if the switch value is known to be 
or forced to be in the range of the switch. For systems 
whose sequencers (such as the Am29331) have a hard- 
ware loop counter, the compiler supports a loop state- 
ment, whichis very useful for coding fast inner loops. For 
Am29331-based systems, the compiler allows loop 
statements to be nested. 


Built-in functions 


Each micro-architecture has a unique interface to exter- 
nal buses, registers, and signals. Each Microcode C 
implementation supports this interface by providing a set 
of built-in hardware functions designed specifically for 
the particular implementation. These built-in functions 
behave like macros in that they are expanded in-line. A 
basic set of built-in functions might include: 


data = input( source); - gets data from an external 


register 

output( sink, data ); - sends data to an external 
register 

cc( condition_code); - tests a hardware condition 
code 


memcycle( type ); initiates a memory cycle 


In this case, “source”, “sink”, “condition _code”, and 
“type” would be chosen from a Set of constants contained 
in a standard file supplied with the compiler. Any special 
timing constraints (such as “you must wait two cycles to 
read back data after cycling the memory”) are enforced 
automatically by the compiler. 


One of the advantages of using built-in functions, as 
opposed to adding new keywords to the language, is that 
it is possible to debug microcode programs on the host 


system using the standard C compiler, simply by writing 
a small library of functions which are equivalent to the 
built-in ones and which simulate the operation of the 
target hardware. 


Scratchpad RAM 


In order to allocate non-register variables, there must be 
some sort of an external scratchpad memory accessible 
to the compiler. When reference is made to a non- 
register variable, the compiler automatically generates 
the micro-operations needed to set up the address and 
write out or read back the data. 


Compaction 


All microcode compilers must do some form of compac- 
tion in order to take advantage of the parallelism usually 
inherent in the micro-architecture. Microcode C uses 
resource-based compaction on straight line code seg- 
ments. Operations are compacted in the order that they 
are generated by the compiler. An operation can be 
moved to precede a previously compacted operation if 
there is space for it and if no resource dependencies are 
detected while trying to move it. 


In-line assembler code 


If it is necessary to code key sections of a program in 
assembler, the compiler allows the user to include as- 
sembler code in-line. In order for in-line micro-assembler 
code to share data with compiled code, there is also a 
mechanism for in-line code to refer to register variables 
by the names they were declared with (rather than by 
number). 


The overall aim is to provide a compiler which is inexpen- 
sive to build, simple and robust in construction, and can 
be relied upon to generate correct code. Although the 
compiler does take care of a great many housekeeping 
details (such as register number assignment and 
“constant folding”), it does not attempt to perform com- 
plex global flow analysis and optimization. Instead, the 
burden of doing so is placed on the programmer. Fortu- 
nately, the C language is designed to permit you to 
perform in source code the kinds of optimizations that 
optimizing compilers usually do. For instance, it is easy 
to recode array references in inner loops to use pointer 
operations instead. 


There are many advantages to using Microcode C to 
write microcode. Programs are more readable, more 
comprehensible, and more maintainable. The use of a 
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high level language dramatically increases productivity 


and makes it much, much easier to try out different 


approaches during software development. 
Hardware Design Considerations 


If you are in the fortunate position of being in the process 
of designing new hardware and you want to know how to 
make it easy for a compiler to produce code for it, here 
are a few ideas. 


ALU 


To begin with, it is always nice if the ALU supports “three 
address code”, which means you can add register A to 
register B and place the result in register C in one 
instruction. 


Second best, but also acceptable, is two address code, 
in which you add register A to register B and place the 
result in register B in one instruction. 


In general, itis preferable for compiling purposes if any of 
the following can be accomplished in one instruction: 


add a register to a register 
move the contents of a register to a second register 
add a constant to a register 


Although these would seem to be fairly simple things to 
do, it is suprising how many micro-architectures are 
unable to carry them out. You should not get the idea that 
itwould not be possible to generate a microcode compiler 
for a given micro-architecture if it cannot perform the 
operations outlined above in one instruction. We recog- 
nize that many other factors, such as cost and board 
space, must be taken into account in your particular 
design and we are well aware of the dangers of over- 
specifying a design. 


For two address architectures, you should try if possible 
to avoid putting any restrictions on the second address, 
such as “the upper two bits of the second address must 
be the same as the upper two bits of the first address”. 
Such restrictions can be worked around successfully, but 
they canbe arich source of bugs and are acceptable only 
if you are sure that the saving of a couple of bits in the 
microword will be worth all the trouble it will cause to both 
compiler writer and micro-programmer! 


Constant Field 


Most micro-architectures provide at least one constant 
field in the micro-instruction word. This field is set with 
constant data for the sequencer (jump addresses) or the 
ALU. This field should be at least as wide as the maxi- 
mum of the sequencer address width and the data 
address width. In the best of all possible worlds, it should 


also be as wide as the ALU and internal data paths. On 
a machine with a 32 bit ALU, it may be too expensive to 
reserve 32 microword bits for a constant field. One 
solution is to reserve only 16 bits and load all constants 
in two steps (load an upper data register from the con- 
stant field and then source the constant field combined 
with the upper data register). This solution can be made 
somewhat more Satisfactory if it were also possible to 
treat the 16 bit data field as a32 bit number in one or more 
of the following ways: 


zero extend the 16 bit constant on the left 
zero extend the 16 bit constant on the right 
sign extend the 16 bit constant on the left 


Sequencer 


In order to implement jump tables for SWITCH state- 
ments and to allow computation of addresses for indirect 
function calls, it is desirable if an address for the se- 
quencer chip can be computed inthe ALU. Typically this 
can be done by providing an external register which can 
be written to from the ALU’s Y bus and then read into the 
sequencer using its “direct” inputs. 


Similarly, ifthe sequencer contains a loop counter (as the 
Am29331 does), it would be nice if it could be loaded with 
an arbitrary value computed at run time inthe ALU. This 
could be done using much the same mechanism as 
described above. 


For branching within the microprogram, it is most desir- 
able if there is a field in the micro-instruction which is big 


- enough to hold the maximum microcode address. It 


should be possible to branch to an arbitrary microcode 
location in one micro-instruction. The address should be 
inone contiguous field of the micro-instruction. Although 
these ideas may seem obvious, we have seen several 
systems which ignored them. For instance, one system 
required the branch address to be loaded into a special 
register, with the actual jumpin a subsequent instruction. 
Another system used a 4 bit “page register” with a 12 bit 
sequencer to address a 16 bit microcode address space. 
Although it was feasible to develop a compiler for both of 
these systems, the hardware design made all branches 
relatively expensive in the first case and all subroutine 
calls relatively expensive in the second case. 


In order to achieve the maximum possible instruction 
rate, most systems are designed so that a conditional 
branch in one instruction is made based on condition 
codes computed in the immediately previous instruction. 
In some systems, all condition codes are latched ina 
register at the end of the first instruction, so that any one 
can be tested in the second. In other systems, the. 
condition code to be tested is selected at the end of the 





5-56 


CHAPTER 5 
Support Tools 





first instruction and only the one selected bitis latched, in 
order to save a couple of chips. A microcode compiler 
can be made to cope with either way of doing things, 
although the first is preferable. 


In general, compiled code cannot always benefit from 
this pipelining of ALU and sequencer operations. A nice 
feature, which you might consider including in your 
design, would be to have an extra bit in the instruction 
which, when set, would cause the cycle length to be 
doubled. If the condition code were available halfway 
through the double cycle, then it would be possible to 
code a conditional test and a branch in the same instruc- 
tion. Although this would not save any time, it would save 
on expensive microword space. 


Floating Point 


It is a relatively simple task to generate code for low 
latency parts, such as the Am29325. 


integer Multiplier 


Multiplications are often generated by compilers during 
subscript calculations, if the size of the object being 
subscripted is not a power of 2. Inorderof increasing cost 
and speed, there are three ways to provide for multipli- 
Cation in a bit slice design. The cheapest is to simply use 
the integer ALU to perform the standard shift and add 
algorithm, which costs one machine cycle per result bit 
(e.g. 32 cycles for a 32 by 32 bit multiplication). The next 
option is to provide a multiplier which can multiply ad- 
dress offsets, but not data, in one cycle. For instance, if 
the data paths were 32 bits, but the address width was 
only 16 bits, you could provide a 16 by 16 bit multiplier. 
This would take one cycle to compute a 16 bit offset, but 
would require four cycles to compute a 32 bit result. The 
fastest option is to use a multiplier, such as the 
Am29C323, which can handle either address or data 
calculations in one cycle. 


Scratchpad Memory 


Inorderto be able to declare non-register variables, there 
must be a memory somewhere to hold them. In most 
systems, this takes the form of a small, fast, local mem- 
ory. Inothers, the bit slice processor uses memory on the 
main system bus. 


lf the memory is on the main system bus (a VME Bus or 
a Multibus, forinstance), then itis usually a byte address- 
able memory. If your processor is to perform only word 
accesses on such a memory, then you might consider 
setting up the addressing so that the processor puts out 
a word address to the bus interface, which converts the 
address to abyte address. Forinstance, suppose the bus 
has 24 address lines. If you use byte addresses in the 


processor, then any time some C code needs to do the 
subscript calculation 


afi, 


it has to multiply the subscript by the size of the object 
being subscripted. Although, this multiplication can be 
converted into a shift if the size is 16 or 32 bits, this 
stillimposes an unecessary penalty for such a routine 
operation. A better scheme (for a processor whose word 
size is 16 bits) would be to use 23 bit addresses in the 
processor and have the bus interface in effect shift the 
address left by one and always supply a least 
significant bit of zero. For a processor which is 32 bits 
wide, you would use a 22 bit address in the processor, 
shift the address by two, and force the two least signifi- 
cant bits to zero. 


Multiple Memories 


One of the fundamental features of C is that it assumes 
that all memory accesses are identical and that a pointer 
can point to any addressable memory location. This 
makes it very tricky to support a system with memories 
with overlapping address spaces. For instance, if you 
have a pointer stored somewhere and you want to 
indirect through it, there are two problems. First, you 
must identify the memory in which the pointer is stored. 
Second, you must identify the memory to which the 
pointer points. 


In most bit slice designs, the problem of overlapping 
address spaces usually comes up in one of two ways. 


In the first and simplest case, memory address space 
overlap almost always occurs with control store memory 
and scratch pad memory. However, it is easy to tell which 
is which if control store memory contains only code and 
scratch pad memory contains only data (which may 
include pointers to functions in control store memory). 


In the second case, the problem may arise if the hard- 
ware Can operate on a host bus, such as a VME bus. 


While it is conceptually possible to support an architec- 
ture featuring multiple memories of different granulari- 
ties, the implementation of the concept would add a great 
deal of complexity to the code generator, because ob- 
jects have different sizes in different memories. For 
instance a structure in one memory would have a differ- 
ent set of offsets to its members than the same structure 
ina memory with different granularity. 


Usually, when Microcode C is implemented on a proces- 
sor, one memory is picked to be the default system 
memory, as far as the microcode is concerned. All 


declared variables are stored inthis memory. Space is 


also allocated within the memory for the run-time stack, 
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if that is required for the implementation. All addressing 
operations generate addresses in this memory. All indi- 


rection operations (including array/structure/union refer- | 


ences) generate addresses within this memory. 


Built-in Functions 


Other memories (if any) are treated as peripheral devices 
and built-in functions are implemented to support them. 
Forinstance, avery common configuration might include 
a word-addressed 4K static memory and an interface to 
a byte-addressed VME bus. A Microcode C implemen- 
tation for such a machine would designate the static 
memory as the main memory. The VME bus would be 
supported by a set of built-in functions, such as 


set_vme_address( expr ); 

result = read_vme__bus(); * at address */ 
write_vme_bus_byte( expr ); /* at address */ 
write_vme_bus _word( expr ); /* at address */ 
write_vme_bus_long( expr ); /* at address */ 


The disadvantage of this scheme is that it makes it 
impossible to use C structure references to refer to such 
external data. However, it does make it easier to support 
some of the more esoteric interfaces, such as those 
which support pre-fetching of data through FIFOs. 


Addressing 


In general, the ALU should be at least as wide as the 
memory address register of the main system memory. If 
itis not, thenitis necessary to resort to either segmenting 
the address space or using very expensive double preci- 
sion integer arithmetic for all address calculations. Nei- 
ther of these two alternatives is very attractive! 


In some micro-architectures, the main integer ALU 
handles allthe work of generating memory addresses. In 
others, there is a separate functional unit, often featuring 
pointer and offset registers. These units are usually very 
effective for the special purposes for which they are 
designed but often lack certain fundamental functionality 
which is very useful to the C compiler. 


The main deficiency, which we have seen in some 
systems, is the lack of the ability to generate an address 
based on taking a constant offset from a pointer register, 
without writing the resultant address back into the pointer 
register. 


Given that MAR stands for “Memory Address Register” 
and that “constant” could be negative, the basic function- 
ality which is desirable for the compiler would include 


MAR = constant 
MAR = arbitrary expression result 


MAR = pointer register + constant 

MAR = pointer register + arbitrary expression result | 
pointer register = constant 

pointer register = arbitrary expression result 


Note that this by no means excludes additional function- 

ality, such as offset registers or multiple MARs. An actual 

hardware implementation could provide several vari- 

ations on this scheme, such as providing operations in 

which a small constant is implicit in the operation, rather 

than having to be placed into a literal field. This allows 

certain memory addressing operations to be combined 
with operations which use the literal field. 


To efficiently support pre-increment and pre-decrement 
operations we add 


MAR = pointer register = pointer register + constant 


To efficiently support post-incremement and post-decre- 
ment operations, we add 


MAR = pointer register 
pointer register = pointer register + constant 


with the sense that this is done in one operation. 
The Stack 


Since the stack pointer (SP) is simply a dedicated pointer 
register, all the operations on pointer registers described 
above also apply to the SP. 


Most modern microprocessors reserve two registers to 
control the stack: the SP (which points to the top of the 
stack) and the Frame Pointer (FP) which points to the 
base of the current stack frame. The use of the FP allows 
a compiler to use stack offsets which are constant irre- 
spective of how much has been pushed onto the stack 
(for temporaries or called function arguments). 


In the interest of avoiding extra overhead on function 
entry and exit and at the expense of some extra internal 
housekeeping, the Microcode C compiler dispenses with 
the use of an FP anduses the SP only. The disadvantage 
of not keeping a separate FP is that the task of generating 
a stack trace back becomes much more complicated. 


Bit Slice Software 

321 Auburn Drive 

Waterloo, Ontario, N2K 2X7 
(519)885-4313 

© 1987 by R. Preston Gurd 
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9.7 WRITABLE CONTROL STORE 


5.7.1 Agility 
AG-11B Microprogram Development 


The AG-11B combines with your IBM persona! computer 
to create a complete development station for micropro- 
gram-based designs. Its high performance and very low 
cost open new design opportunities for using flexible bit 
slice, ASIC, DSP, and 32-bit building block architectures. 
The AG-11B provides high speed in-circuit emulation of 
your design target’s ROM or PROM. 


Writable Control Store 


The heart of the AG-11B is the Writable Control Store 
module (WCS) resident within your IBM PC. Each WCS 
has a memory array 96 bits wide by 4096 words deep 
which can be increased in width and/or depth with addi- 
tional modules to suit virtually any size microprogram- 
med application. You microcode is loaded into WCS 
memory using your personal computer and AG-11B 
software. The WCS utilizes high-speed static RAM 
which provides a 50 ns maximum access time to your 
target. : 


Configurable Buffer Interface and Software 


The AG-11B offers maximum flexibility in configuring 
for your particular design. The WCS interfaces to your 
target through the Target Interface Board. The hard- 
ware is complemented by the AG-11B software, which 
allows easy software control of your configuration vari- 
ables. The AG-11B software, which is either menu- 
driven or command-line driven, provides control of 
breakpoint and target control signals and complete WCS 
card diagnostics. 


mcASM Microcode Assembler 


Included optionally with the Ag-11B is the mcASM Struc- 
tured Microcode Assembler. Developed as a joint effort 
between Microtec Research and Advanced Micro De- 
vices, this assembler features macro support, design 


rule checking, nonpositional keyword syntax, and relo- 
catable segments. mcASM lets you define your target’s 
architecture and assembly mnemonics, and then pro- 
duces executable microcode for your target in a format 
that is easily loaded into the WCS. 


Applications 


Microprogrammed architectures are increasingly used to 
boost performance in applications such as graphics, 
peripheral controllers, communications, military, robot- 
ics, and industrial automation. The AG-11B supports all 
architectures which use microprogramming, including bit 
slice as well as ASIC, DSP, and 32-bit building block 
devices. And since it is not designed for any specific 
architecture, the AG-11B is adaptable to any micropro- 
grammed product. 


Cost and Time Savings 


The AG-11B: 


uses the computing power of an inexpensive 
IBM PC 


comes at a fraction of the cost of other micro- 
code development stations 


is a cost-effective way to set up multiple 
development stations so that microcode devel- 
opment work can proceed in parallel 


lets you avoid the time and expense of burning 
new PROMs after each change to your micro- 
code 


increases the productivity and morale of 
firmware engineers 


is available immediately and can be set up 
quickly and easily 


For more information, contact Agility, 1290 Lawrence Station 
Road, Sunnyvale, CA 94089, (408) 744-0806. 


Reprinted with permission from Agility 
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Bipolar building blocks 


deliver supermini speed 


to microcoded systems 


performance of bipolar circuits, bipolar 
technology is taking the next step to 
keep itself in the lead for the highest speed 
systems. A family of five bipolar VLSI com- 
putational circuits—fabricated with a scaled, 


Ae processes start to encroach on the 
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ion-implanted, oxide-isolated process and three 
levels of metal interconnections for high den- 
sity — provides a set of functionally partitioned 
microprogrammable VLSI building blocks for 
systems such as superminicomputers, digital 
signal processors, high-speed controllers, and 
many others. The modularity of the system 
functions ensures that the chips can meet the 
performance requirements of a general- 
purpose superminicomputer, as well as those of 
an image processor, which are radically differ- 
ent from each other. 

Included in the family are three parts that 
form the core of a general-purpose micro- 
programmed system: a 32-bit arithmetic and 
logic unit (ALU), a 16-bit microprogram 
sequencer, and a 64-by-18 four-port, dual- 
access RAM. And, for systems that do a large 
number of multiplications or floating-point 
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operations, two performance accelerators—a 
32-by-32-bit multiplier and a 32-bit floating- 
point processor will be available to tie onto the 
buses (see Design Entry, p. 246). 

The chips offer high performance, a flexible 
architecture, and microprogrammability, and 
even address the problem of fault detection for 
data integrity. These circuits can thus support 
an extremely fast microcycle—about 80 ns 
(projected). That high speed is the result of 
several design considerations: Each part is de- 
signed internally with ermitter-coupled logic 
but has TTL-compatible inputs and outputs. 
Second, more power was allocated to the logic 
circuits used in the critical paths than for logic 
in the noncritical paths on each chip, to max- 
imize the speed. Third, by integrating highly 
specialized logic on chip it is possible to execute 
very complex operations in a single cycle. 

The microprogrammability of this chip set 
offers several benefits to the system designer. 
It provides a structured and systematic ap- 
proach for implementing the control mech- 
anism of the system, and like the bit slices, it al- 
lows the instruction set to be customized to suit 
the designer’s application (see “Architectural 
Limitations of Bit Slices,” opposite). And 
several versions of the initial design can be 
tested, or current designs can be enhanced 
simply by changing the microcode. — 

Thus, the functionally partitioned Am29300 
family overcomes all of the performance penal- 
ties of bit-slice structures, while maintaining 
its ability to form a wide variety of architec- 
tures. Even though the chips are designed to 
work together as a family, each can also be used 


independently in an application that requires . 


its unique capabilities. 
Pipelines are out 


The flexibility of the Am29300 family is 
largely due to a decision not to place pipeline 
stages within the functional blocks. Not includ- 
ing the pipeline registers inside incurs some 
off-chip delays. This is a small price to pay to al- 
low system designers to optimize the pipeline 
structure for their individual needs. Moving the 


register file out of the functional block for the © 


ALU also slows things down. At the same time 
it does not force a fixed register size on the user, 
enabling systems to be created with dedicated 


registers, register windows, or register banks— 
all with neither fixed depth nor width. 
Additionally, the high level of integration 
helps eliminate the propagation delays often 
encountered when signals must go from chip to 
chip. The use of VLSI also results in fewer parts 
at the system level, which, in turn, conserves 
power (usually many watts in the case of bi- 
polar systems) and board space. Lastly, a com- 
plete 32-bit solution is provided for applications 
that require increased precision for arithmetic 
operations, high memory bandwidth, and a 


Architectural limitations 
of bit slices 


The limited performance of bit-slice circuits can 
be improved by increasing the width of the slices. 
That higher level of integration results in higher 
performance by reducing the number of off-chip 
delays while preserving the flexibility that has 
made bit-slice systems so attractive. However, as 
higher levels of integration become possible, two 
inherent problems with bit-slice architectures 
will limit their ultimate speed. The first involves 
the off-chip delays inherent in cascading. For ex- 
ample, the carry chain is usually the slowest path 
of an ALU. Breaking this chain between slices in- 
troduces off-chip delays into the critical path. 

The second problem is that the functional needs 
of many systems do not slice well. Barrel shifters 
and prioritizers are especially difficult to cascade. 
Unfortunately, the ability to perform N-bit shifts 
and locate the position of leading 1s are of greatest 
importance in applications that require heavy 
number crunching and manipulation of data 
fields, such as image processing, graphics, data- 
base management, and controllers. These are pre- 
cisely the applications whose need for speed forces 
the use of bit-slice devices. The system per- 
formance is compromised not only because these 
operations must be done bit by bit, but also be- 
cause many high speed algorithms cannot be effi- 
ciently implemented. . 
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large addressing capability (4 billion bytes) to 
support virtual memory systems (Fig. 1). 

The performance of a system depends, not 
just on its raw computing speed, but on its abili- 
ty to respond to events such as interrupts and 
traps. For example, the Am29331 sequencer re- 
sponds to both interrupts and traps at the mi- 
croprogram level very quickly, and its response 
is completely transparent to the interrupted 
microroutine. Also, the Am29332 ALU indirect- 
ly supports the handling of these events by al- 
lowing its internal state to be saved or restored. 

The Am29332, a noncascadable 32-bit-wide, 
ALU, provides fast number crunching, high 
data transfer rates, and powerful bit-manip- 
ulation capabilities. Intended to be used with 
the Am29334 dual-ported RAM, which serves 
as an external register file, the ALU has two 
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32-bit input buses (DA and DB) and one 32-bit 
output bus (Y). 

Internally, the device has a 32-bit data path 
that interconnects its various functional 
blocks. These blocks include various shifters 
and multiplexers, a mask generator, a funnel 
shifter, the ALU proper, a priority encoder, a 
parity generator and checker, a master-slave 
comparator, and the status and Q registers 
(Fig. 2). The ALU proper has three 32-bit in- 
puts: R, S and M. The R input comes from the 
funnel shifter, the M input from the mask gen- 
erator, and the S input from a variety of sources 
—the DA or DB buses, status register, or the Q 
register. 

The power and flexibility of the Am29332 
comes partly from its ability to perform oper- 
ations on various data types. It can operate on 
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4. A conventional CPU, built with Am29300 building blocks, forms the focal point of an 
extremely compact system that cycles as fast as 80 ns. 
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variable bytes, variable-length bit fields, or sin- 
gle bits. This is made possible by the internal 


'mask generator, which creates a 32-bit mask 


for each instruction (with no time overhead). 
The mask is used as an additional operand in 
each instruction to allow the operation on only 
selected data widths. 

The type of mask generated depends on the 
type of instruction. For instructions that oper- 
ate on variable bytes (1, 2, 3 or 4 bytes) the mask 
is a fence of 1s (bit 0 aligned) for all low-order 
selected bytes with a fence of 0s for all high- 
order unselected bytes. Instructions that oper- 


ate on variable-length bit fields require a mask 


that is a string of contiguous 1s for all selected 
bit positions and 0s for all unselected bit posi- 
tions. In cases where the field exceeds the 32-bit 
boundary, the mask does not wrap around, thus 
P, O P.O 
4 . 4 
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allowing operation on a contiguous field across 
a word boundary. For instructions that operate 
onasingle bit, the mask isa1 for the selected bit 
position and Os for the other unselected bits. 

For most single-operand instructions, the 
unselected bit positions pass the corresponding 
bits of the operand unmodified. For most two- 
operand instructions, the unselected bit posi- 
tions pass the corresponding bits of the operand 
unmodified on the DB input. Thus, for two- 
operand instructions the mask allows the 
merging of two operands ina single cycle. In ad- 
dition to being used internally, the mask can be 
sent out over the Y bus, permitting the gener- 
ator to be used as a pattern generator for test- 
ing purposes. 

To speed various mathematical and logical 
operations, many circuits have started to in- 
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2. To connect its various internal functional blocks, the Am29332 ALU 
employs a 32-bit bus. Among the chip’s major features are a 64-bit fun- 
nel shifter, parity checking and generation, and a basic 32-bit ALU that 
has three input ports. The processor also has three 32-bit ports through - 


which it transfers data into and out of the chip. 
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clude a barrel shifter, which has an N-bit input 
and an N-bit output. The barrel shifter would 
be used to shift or rotate the operand either up 
or down from 0 to N bits in a single cycle. Such 
high-speed shifting is very useful in operations 
such as the normalization of a mantissa for 
floating-point arithmetic or in applications in 
which the packing and unpacking of data are 
frequent operations. 

However, a more useful circuit is a funnel 
shifter, which can be thought of as having two 
N-bit inputs and one N-bit output. Just such a 
circuit (with 32-bit-wide ports) was included on 
the 29332. The circuit can perform all the oper- 
ations of a barrel shifter with capabilities ex- 
tended to two operands instead of one. In addi- 
tion, it can extract a 32-bit contiguous field 
across its two operands, a function very useful 
in several graphics applications. And any of its 
operations can be followed by a logical oper- 
ation, with both completed in a single cycle. 


Setting the priorities 


Prioritization, useful to control N-way 
branches, perform normalizations, and in 
graphic operations such as polygon fills, can 
readily be handled by the ALU chip. The built- 
in priority encoder sends out a 5-bit binary 
weighted code that signifies the relative posi- 
tion of the most-significant 1 from the most- 
significant bit position of the byte width se- 
lected. That allows prioritization on either 8-, 
16-, 24-, or 32-bit operands. The priority encoder 
output can be passed on to the Y bus or stored in 
the status register. 

If, for example, prioritization is used to nor- 
malize a mantissa during a floating-point 
arithmetic operation, it requires two cycles. In 
the first, the mantissa is prioritized to deter- 
mine the number of leading 0s that need to be 
stripped off. In the next cycle, the mantissa is 
shifted up by the amount specified by the prior- 
ity encoder output. 

Relevant information for each operation per- 
formed by the chip is stored in the 32-bit status 
register after each microcycle. Each byte of the 
status word holds different information. The 
least-significant byte holds the position spec- 
ifier. The next most-significant byte holds the 
width specifier and three other bits that are 
used to test the comparison of unsigned and 
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signed operands. The next byte contains the 
Carry, Negative, Overflow, Link, Zero, M and S 
flags. The M flag stores the multiplier bit for 
multiply or the sign compare bit for signed di- 
vision, and the S flag stores the sign of the par- 
tial remainder for unsigned division. The most 
significant byte stores the nibble carries for 
BCD operations. 

The states of the Carry, Negative, Overflow, 
Link and Zero flags are available on the status 
pins, and the status multiplexer allows the user 
to select either the status of the previous in- 
struction (register status) or the status of the 
current instruction (raw status) to appear on 
the status pins. The raw status could be used to 
update an external macro status register. This 
also allows branching at.either the micro- or 
macro-level. 

The Q shifter and Q register are primarily 
used to assemble the partial product or partial 
quotient in multiplication and division oper- 
ations. Variable bytes of the status and Q reg- 
ister can either be loaded via the DA and DB 
inputs or can be read over the Y bus. Thus sav- 
ing and restoring of the registers allows effi- 
cient interrupt handling after any microcycle. 
It is also possible to inhibit the update of both 
these registers by asserting the Hold pin. 


Powerful and orthogonal instructions 


The power of the ALU chip’s instruction set 
comes directly from the integration of several 
functional blocks mentioned earlier. The com- 
mands are symmetrical as well as orthogonal, 
to make it easier for a compiler to generate effi- 
cient code. Thus, any operation on the DA input 
is also possible on the DB input, and each in- 
struction is completely independent of its data 
type. 

Three-fourths of the instruction set consists 
of variable byte-width (one, two, three or four) 
operand instructions. The byte-width is se- 
lected by two bits in the instruction. For these 
operands, the instruction set supports all con- 
ventional arithmetic, logical and shift oper- 
ations. Arithmetic operations can be per- 
formed on both signed and unsigned binary 
integers. 

Additionally, the instruction set supports 
multiprecision arithmetic such as addition 
with carrying and subtraction with carrying or 
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borrowing. For all subtract operations it pro- 
vides the convenience of using borrowing in- 
stead of carrying by asserting the borrow pin. 
In this mode the carry flag is updated with the 
true Borrow. To allow efficient execution of 
macroinstructions the chip contains a Macro 
mode pin. When the chip asserts this pin, it al- 
lows the external Macro-Carry and Macro-Link 
bits instead of their microcounterparts to part- 
icipate in the operation. 

Instructions that execute algorithms for the 
multiplication and division of signed and un- 
signed integers are multiple cycles are also pro- 
vided. For multiplication, the circuit supports 
the modified Booth algorithm, yielding two 
product bits in one cycle. Both single-precision 
and multiprecision division of signed and un- 
signed integers are supported at the rate of one 
quotient bit in every cycle. 

Besides binary integers the instruction set 
provides basic arithmetic operations for 
binary-coded decimal (BCD) numbers. By oper- 
ating directly on the decimal numbers created 


Device X 





3. To help ensure system integrity, two Am29332 
processors can be set for master and slave oper- 
ation. Both chips perform the same operation in par- 
allel, and any difference in their results is flagged as 
an error. The master also checks its internal result 
against the data on the output bus to make sure 
that no other device (such as device X) is turned on 
at the same time. 
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in most business applications, significant pro- 
cessing time is saved by eliminating the need to 
convert from binary to BCD and vice versa. 
Also, the round-off errors involved in con- 
verting from one base to the other are elimi- 
nated. 

The last group of instructions was created to 
support variable-length bit fields (1 to 32) and 
single-bit operands. The position and width of 
the field can be specified by either the position 
and width inputs or by fields in the status reg- 
ister, thereby saving bits in the microcode. 
Most of the time, the position and width are 
determined dynamically. It is therefore diffi- 
cult to supply them via the microinstructions. 
For single bit operations only the position spec- 
ifier is needed. 

Bit-manipulation instructions include set- 
ting, resetting, or extracting a single bit of the 
operand or the status register. Logical oper- 
ations on either aligned or nonaligned fields in 
the two operands include OR, AND, NOT and 
XOR. In the case of nonaligned fields it is as- 
sumed that at least one of the fields is aligned to 
bit position 0. It is also possible to extract a field 
from one operand and insert it into another 
operand or extract a field across two operands. 


Enhancing system integrity 


The growing need for data integrity has been 
addressed at both the system and the chip level 
by including hardware for fault detection. Dur- 
ing calculations, byte-wide even parity is gener- 
ated for the data result by the ALU and stored 
with the data in the external RAM. Byte-wide 
even parity is also checked at the ALU inputs 
and any error is flagged. | 

Even parity is specifically used to check for a 
floating TTL bus. Thus, all interchip connec- 
tions are checked out. In addition, hardware for 
functional verification is also provided on the 
sequencer and the ALU functional verification 
can be implemented by using two similar de- 
vices in the master and slave mode (Fig. 3). In 
that setup, both chips perform the same oper- 
ation, with any difference in their outputs being 
flagged as an error. The slave-mode chip’s bidi- 
rectional buses operate in their input mode, al- 
lowing the master to compare its own internal 
result with that of the slave on every cycle. Ad- 
ditionally, the master checks the output bus to 
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make sure that no other device is turned on at 
the same time. 

As mentioned earlier, the ALU architecture 
was designed to use an external register file. 
Keeping the file external to the chip permits the 
user to expand it to meet any system need. The 
Am29834, a high-speed 64-word-by-18-bit dual- 
access RAM, provides two independent data in- 
put ports and two independent data output 
ports (Fig. 4). Each port can be read from or 
written to using the separate inputs and out- 
puts. The two accesses are independent except 
for the case when simultaneous write opera- 
tions are done to the same word—in which case 
the result is undefined. The read address inputs 
and the write address inputs of each side are se- 


Am29334 


dual-port 


RAM 
(64 X 18 bits) 


4. The dual-access RAM serves as an external reg- 
ister file for the arithmetic processor chip. The 
Am29334 holds 64 words, each 18 bits jong. Two 
chips are often connected to build a RAM block with 
four data outputs, two data inputs, and six address 
lines. Each port of the RAM can be independently 
accessed to read or write. 
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parate in order to save the cost and time delay 
of external multiplexing between a read ad- 
dress and a write address. 

The word width of 18 bits allows the RAM to 
store two bytes plus a parity bit for each. Each 
side has separate write enable for the lower and 
upper nine-bit bytes and a common write en- 
able that also switches the address multiplexer. 
The actual write is delayed internally to allow 
the write address to set up internally before 
writing starts. 

It is possible to build a RAM with four data 
outputs, two data inputs and six addresses by 
using two dual-access RAMs and on each side 
connecting the data input, write address and 
write enables of one RAM in parallel with the 
corresponding inputs of the other RAM. This 
expanded RAM may be used in concurrent pro- 
cessing applications in which an ALU and an 
adder (which generates the address) do their 
computations—this yields a result and an ad- 
dress in parallel. The two values can then be fed 
simultaneously to the multiport memory. 


The sequencer controls the show 


The cycle time of the microprogrammed sys- 
tem is dependent on both the control path (i.e., 
sequencer and microprogram memory) and the 
data path (i.e., register file and ALU). Tradi- 
tionally, the system bottleneck has been the 
control path, especially the ciritical paths asso- 
ciated with conditional branching. Special care 
has been taken in the design of the Am29300 
family to balance control and data-path timing. 

A key device contributing to the improved 
control-path timing is the Am29331 16-bit mi- 
croprogram sequencer. It is designed for high 
speed, and that speed has been attained by the 
elimination of functions that would slow down 
the microaddress selection and by including the 
test logic and the test multiplexer in the se- 
quencer (Fig. 5). Asin most previous generation 
sequencers, the address register, the incre- 
menter, the address multiplexer, the stack, and 
the counter are standard functions. The se- 
quencer has multiway branch instructions that 
allow 1 of 16 consecutive addresses to be se- 
lected as the branch target in a single cycle. 

The address register in most other sequen- 
cers is called a program counter, but this name 
is not correct if a strict definition is applied. In 
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the Am29331, the incrementing counter is 


' placed after the address register, which thus al- 


lows for the handling of traps. The stack stores 
return addresses, loop addresses and loop 
counts. It has 33 levels to permit the deep nest- 
ing of subroutines, loops and interrupts. An 
output, Almost Full (A-Full), indicates when 28 
or more of the levels are in use. 

Available for use in iterative loops, the 
counter can be loaded with an iteration count at 
the beginning of a loop, and the count is tested 
and then decremented at the end of the loop. 


Test 
logic 
8 8 
Test 
multiplexer 


The loop is terminated if the count is equal to 
one; otherwise a jump to the beginning of the. 
loop is executed. 

There are three buses that carry microad- 
dresses. The bidirectional D bus can be con- 
nected to the pipeline register, providing 
branch addresses or loop counts, or used for 
two-way communication with the data process- 
ing part of the system. The A bus, called an al- 
ternate bus, can be connected to a mapping 
PROM to provide starting microaddresses for 
instructions in acomputer. The Y bus sends out 
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5. To aid in handling trap operations, the incrementer is placed after the address 


register in the Am29331 microsequencer. Additionally, the chip has a 16-bit ad- 
dress bus, which enables it to access up to 64 kwords of control memory and han- 
die interrupts and multiple-path branches. 
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selected microaddresses to the microprogram 
memory and accepts interrupt or trap address- 
es if interrupt or trap is employed. 

Four sets of 4-bit multiway inputs provide a 
simultaneous test capability of up to 4 bits. 
And, one way to use those inputs would be to 
decode mode bits in changing positions in mac- 
roinstructions. The four select lines select 1 
of 16 tests to be used in conditional instructions. 
There are twelve test inputs. Four of these may 
be used for C (Carry), N (Negative), V (Over- 
flow) and Z (Zero), generating internally the 
tests C+Z, C + Z,N XOR V, and N XOR V+Z, 
which are used for comparison of signed and 
unsigned numbers. . 

Relative addressing was the only somewhat 
useful function that was removed in order to 
maximize speed. The sequencer supports inter- 
rupts and traps with single-level pipelining, but 
may also be used with two levels of pipelining in 
the control path. It has a 16-bit-wide address 
path and cannot be cascaded, which thus limits 
the addressable memory depth to 64 kwords of 
microcode. That, however, is sufficient for the 
vast majority of applications—a typical 
computer, for instance, that has a micropro- 
grammed instruction set, might use only about 
1 to 2 kwords. However, for systems in which 
the microprogram is the sole program level, its 
size is generally larger. 


Microprogram interrupts supported 


The Am29331 sequencer supports interrupts 
at the microprogram level. Like polling, inter- 
rupts handle asynchronous events. However, 
polling requires explicit tests in the micro- 
program for events, thus leading to long re- 
sponse times, lower throughput, and larger mi- 
croprograms. Interrupts, on the other hand, 
have a response time equal to the cycle time of 
the system (approximately 80 ns), measured 
from the Interrupt Request input (INTR). The 
sequencer accepts interrupts at every micro- 
instruction boundary when the Interrupt En- 
able input (INTEN) is asserted. 

An actual interrupt turns off the Y bus driver 
and asserts the Interrupt Acknowledge output 
(INTA), which should be used to enable an ex- 
ternal interrupt address onto the Y bus, thus 
driving the microprogram memory. The inter- 
rupt also causes the interrupt return address to 
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be saved on the stack; this permits nested inter- 
rupts to be handled (Fig. 6). 

The Am29331 is also the first sequencer that 
can handle traps. A trap is an unexpected situa- 
tion caused by the current microinstruction, 
which must be handled before the microin- 
struction completes and changes the state of 
the system. An attempt to read a word from 
memory across a word boundary in a single cy- 
cle is an example of such a situation. When a 
trap occurs, the current microinstruction must 
be aborted and re-executed after the execution 
of a trap routine, which will take corrective 
measures. , 

Execution of a trap requires that the se- 
quencer ignore the current microinstruction 
and push the trap return address—the address 
of the ignored microinstruction—on the stack. 
The trap address must be transferred onto the 
Y bus at the same time. All this can be accom- 
plished by disabling the carry-in to the incre- 
menter (C;,) and asserting the Force Continue 
input (FC) and the Interrupt Request input 
(INTR). 

Also built into the sequencer is an address 
comparator, which allows detection of break- 
point in the microprogram. An output signal 
from the comparator indicates when the con- 
tent of the comparator register is equal to the 
address on the Y bus. There is an instruction 
that loads the comparator register from the D 
bus and enables the comparator, which may lat- 
er be disabled by another instruction. 

Parallel microprocesses are useful when the 
system must deal with peripheral devices that 
are controlled at the microcode level. Normally 
only one processor is present and it must be 
time multiplexed between the concurrent oper- 
ations that must be performed. When a process 
is suspended its private state must be saved, so 
that it can be restored when the process re- 
sumes execution. That, in turn, requires that 
the state of the sequencer be saved and re- 
stored, or each process must have its own 
sequencer that is active when the associated 
process is active. The first approach is the least 
expensive, but the second offers the advantage 
of shorter response time, because no time is 
spent on saving and restoring the state. 

The Am29331 supports the first approach 
with its bidirectional D bus, through which the 
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entire state, with the exception of the com- 
parator register, can be saved and restored. The 
sequencer also supports the multiple sequencer 
arrangement, in which the three-state Y buses 
from the sequencers are tied together driving a 
single microprogram memory. One of the se- 
quencers is active, while the remaining sequen- 
cers are put on hold by asserting their Hold 
inputs. The Hold input disables most outputs 
(the D bus synchronously), disables the incre- 
menter, and enables an internal Force Con- 
tinue. This effectively detaches the sequencer. 


A :CALLC 
A+1:... 





B  : CONTINUE 
B+t:... 


CG Pens Executing at A 


from the system and preserves Its state. 

The sequencer has a 6-bit instruction input 
that is internally decoded to yield a set of 64 in- 
structions. There are 16 basic branch instruc- 
tions, each in an unconditional] version, a condi- 
tional version, and a conditional version with 
complemented test. In addition there are 16 
special instructions like Continue and Push C 
(push counter on stack). The branching instruc- 
tions handle jumps, subroutines, various kinds 
of loops and exits out of loops, and FC actually 
overrides the instruction inputs with acontinue 





Executing at B 


Off 


B+1 


6. Because it can accept interrupts at any microinstruction boundary, the sequencer responds faster than 
most other microprogrammed systems. For example, while the instruction at point A in memory is being 
executed, the sequencer is directed to point B. The only restriction on the programmer is that the first in- 
struction of the interrupt routine cannot use the stack, since the interrupt return address is pushed onto it at 


the start of the procedure. 
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Microprogrammable 32-bit chips 


instruction. FC is useful in field sharing and 
support for writable microprogram memory. 
The Am29331 is one of the few sequencers 
where the stack is accessible from outside 
through the bidirectional D bus. This indirectly 
allows access to the whole state of the se- 
quencer except the comparator register. This is 
useful when testing the device, and during 
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system debugging, in which, for example, the 
contents of the counter and the stack may be 
examined and altered. By including the trou- 
bleshooting instructions in the microcode, the 
sequencer may aid in debugging itself and the 
rest of the system. The access to the state is also 
useful for changing context or extending the 
stack outside.O 
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Application Note 


By Mark McClain 


This application note describes the design of a high performance microprogrammed 
32-bit processor using the Am29300 family of 32-bit building blocks. Basic design 
philosophy for a microprogrammed processor is discussed as the design choices 
made for this system are explained. Support circuitry used with the Am29300 family 
components is also covered in detail. This circuitry includes: Writable Control Store, 
Serial Shadow Register diagnostics, and Programmable Array Logic. 
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SECTION 1 
Overview 
' This application note describes the design of a high Am29331 - 16-bit Address Sequencer, 
performance microprogrammed 32-bit processor using Serer . 2 ; 
the Am29300 family of 32-bit building blocks. eee Soo mele eget 
Am29334 - 64 x 18-bit Four Port Register File, 

Basic design philosophy for a microprogrammed proces- aes si 
sor is discussed as the design choices made for this pices 32-bit Parallel (integer) Mulliptier 

; Accumulator, 
system are explained. Issues of microprogram sequence 
control, interrupt handling, microprogram memory op- Am29325 - 32-bit Floating Point Unit, 
tions, microword layout, macroprogramming, high speed Am29114 - Interrupt Controller, 
UDP: BU Cibecomlolate coveted: Am29800 - Family of Interface and Diagnostics 
Support circuitry used with the Am29300 family compo- | Logic Devices, 
nents is also covered in detail. This circuitry includes: Am29PL141 - Fuse Programmable State Machine, 
Writable Control Store, Serial Shadow Register diagnos- AmPAL18P8 —- Programmable Output 20-pin Combi- 


tics, and Programmable Array Logic. 


The use of the following Advanced Micro Devices com- 
ponents is illustrated in extensively documented ex- 
amples: 


natorial PAL, 
AmPAL22V10 = - Output Macrocell 24-pin PAL, 


Am9151 - Registered RAM with SSR™, 


Am99C165 -16K x 4-bit CMOS high speed 
RAM. 
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Figure 1-1. System Components 





SSR is a trademark of Advanced Micro Devices, Inc. 
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SYSTEM LAYOUT 


As with all processors, this system contains three main 
portions: Central Processing Unit (CPU), memory, and 
input/output (I/O) (see Figure 1-1). 


The CPU consists of acontrol section and a data section: 


The data section manipulates data via operations such 
as addition, subtraction, shifting, merging, multiplication, 
and division. These functions are implemented with the 
Am29332 Arithmetic Logic Unit (ALU), Am29325 Float- 
ing Point Processor (FPP), and Am29C323 Parallel 
Multiplier (PM). The data section also stores operands 
and intermediate results in Am29334 register files. 


The control section directs the operations performed by 
the data section and determines the order in which the 
operations are performed. This section contains the 
Am29331 Microprogram Sequencer, macro opcode 
register & decode, interrupt control logic, microcode 
control store, control decoding logic, and control multi- 
plexers for the register file and ALU. 


The memory contains a 16K word by 36-bit static RAM. 
Included as part of the memory block are two address 
registers/counters, which may be used to speed up 
sequential reads and writes made by the CPU. 


The I/O portion is a simple connection to a host system’s 
address and data bus. It is assumed that the Am29300 
demonstration system operates as a peripheral proces- 
sor to a larger host system, as might be the case with an 
array or digital signal co-processor. Information to be 
processed by the demonstration system is loaded into 
the memory portion via Direct Memory Access (DMA). 
When processing of the data is complete, the host 
system unloads the memory portion via DMA. 


A diagnostics port is also provided as part of the I/O 
section. This port allows control over the demonstration 
system clock for single stepping, and it allows for serial 
diagnostics to display and control the state of the system. 


Throughout the remainder of this application note, it is 
assumed that the reader has some previous experience 
with microprogrammed processor design and is familiar 
with the Am29300 family data sheets. For those readers 
not familiar with microprogrammed design, some refer- 
ence material is listed in Appendix A. 


DATA FLOW 


The system data paths are illustrated in the block dia- 
gram of Figure 1-2. 


Memory and I/O Sections 


Information processed by the Am29300 system is ex- 
changed between the host system and the memory via 
the external bus interface. The information may be both 
data and macroinstructions. 


From the external bus, the host system is able to address 
the memory via the bus driver connected to the memory 
address bus. Data is moved over the memory data bus. 
The host system’s only access to the Am29300 system 
is via these buses to the memory. Therefore, all data to 
the system flows through the memory via DMA accesses 


by the host system. 


Diagnostic control and information flows through the 
external bus interface via the host interface controller. It 
controls the clocking and single stepping of the system 
while loading and reading serial diagnostics via Serial 
Shadow Registers (SSR) that are placed in key locations 
throughout the system. 


(SSR is a trademark of Advanced Micro Devices, Inc.) 


Data Section 


Data must be moved from the memory to the register file 
to be available to the ALU and multipliers for processing. 


The register file has four access ports, two ports for 
writing data into the file and two ports for reading data out 
to the ALU and multipliers. This arrangement allows two 
operands to be read from the file inthe same cycle as two 


_ operands are being written. The two read operands are . | 


used eitheras AandB operands forthe ALU, FPP, or PM, 
or as address and data inputs to the memory. 


To move data from the memory to the register file, an 
address to the memory is selected from the register file 
onthe A read port. This address selecis a word from the 


‘memory that is transferred onthe memory data bus to the 


B write port of the register file. 


Once data is loaded into the register file, it can then be 
selected for use on either the A or B read ports for input 
to the ALU, FPP, or PM. 


Data processing results from the ALU, FPP, or PM are 
then placed on the Y bus for return to the register file A 
write port. 


Finally, processed data is moved back to the memory via 
the B read port of the register file, while the location to be 
written in the memory is addressed by the value onthe A 
read port of the register file. 
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1K X 92 bits WCS Using Am9151, 
2K X 92 bits PROM Using Am27S75, 
4K X 92 bits PROM Using Am27S85 Address 
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Figure 1-2. Am29300 Demonstration System 


(NOTE: The advantage of using both write ports on the 
register file is that it is possible to perform calculations 
and write the results via the A write port at the same time 
that new data is being moved into the register file fromthe 


mernory via the B write port. This will be illustrated in | 


more detail later in this document.) 


Control Section 


D Bus 


The D bus is a highway for information flow between the 
microcode control store, interrupt control sequencer, and 
data section of the CPU. 


Branch addresses or constants from the microcode can 
pass to the sequencer via the D bus. The interrupt 
controller's interrupt vector base address register may 
also be loaded via the D bus. 


Constants from the microcode can pass to the data 
section for use in calculations via the D bus to A bus 
transceiver. Microcode constants can also be used as 


addresses to the memory, viaa D bus to A bus to memory 
address bus connection. 


Variable data can be passed from the register file to the 
sequencer. The sequencer can also return data to the 
register file, viathe A bus to ALU Y bus toA write port 
path. The D bus path to the sequencer is valuable for 
storing and retrieving the state information in the se- 
quencer when interrupts, traps, or context switches 
occur. 


Control Decode 


This section of logic expands encoded microcode fields 
into individual control lines used throughout the system. 


Interrupt Logic 


This circuit monitors interrupt and trap conditions suchas 
parity errors and breakpoints. When an interrupt condi- 
tion is detected, an interrupt request to the sequencer is 
made and an interrupt address vector generated. 
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Sequencer 


The sequencer is an address multiplexer with an on-chip 
address incrementer and stack. It selects the address for 
each microinstruction word read from the control store. 
The address selected depends on the instruction to the 
sequencer and on the state of test conditions. The 
- sequencer can select addresses from the branch field of 
the control pipeline register, the macro opcode map, the 
internal stack, the increment of the last microinstruction 
address, or one of four status condition driven multi-way 
branch inputs. ; 


Macro Opcode Support 


Macro vs. Micro Programs: A microprogram is the 
definition for the state of the primary system control 
signals during each system clock cycle. Each word of 
microcode usually has a large number of bits so that 
many parallel operations may be controlled simultane- 
ously. Each microcode word must deal with the intricate 
details of system operation. The writing of microcode is 
a slow tedious process that must take into account every 
facet of system operation in order to provide the most 
efficient use of system resources. 


‘The advantage of microcode is that, very often, different 
system operations can be overlapped (done in parallel) 
since there is parallel control over all the system re- 
SOUICES. 


A “macroprogram” is a series of microcode subroutine 
calls. Each macroinstruction has an opcode field that is 
simply a value that can be translated into the starting 
address of a microcode subroutine within the system 
microprogram. The macroinstruction may include para- 
meters that are passed to the microprogram. These 
parameters might be register addresses, loop counter 
values, immediate data, or memory addresses. 


The advantage of amacroprogram is that the instructions 
are very simple and require relatively few bits to define as 
compared to a microcode word. The macroinstructions 
are simpler because all the details of system operation 
are specified by the underlying microcode instructions. 
The simpler instructions allow macroprograms to be 
written much more quickly than microprograms. There- 
fore, once a set of microcode subroutines are developed 
to perform the most often needed system operations, a 
wide variety of macroprogram applications can be 
quickly written. Macroinstructions remove the system 
programmer’s concern over every detail of system 
operation. 


The disadvantage of a macroprogram is that each in- 
struction must be fetched from memory and decoded 
(translated to a microcode subroutine address) before 


each microcode subroutine is executed. When each 
subroutine execution is long compared to the overhead 
of fetching and decoding the macroinstruction, the 
macroprogram will run nearly as fast as an equivalent 
microprogram with the advantage being a much easier 
programming task. When the microcode subroutines are 
short compared to the macroinstruction overhead, the 
system speed can drop significantly. 


So, if macroprogramming concepts are used carefully, a 
macroprogrammed approach to system design can yield 
a significant improvement in the ease of system use 
without a large decline in system performance. 


For that reason, the Am29300 demonstration system 
includes the features described below, which allow a 
macroprogrammed approach. These features are in- 
tended to show how basic macroprogramming can be 
implemented. 


Macro Opcode Register: When macro-instructions are 
executed, the instructions are addressed in the memory 
via the A read port of the register file in the same way as 
described earlier for data. The selected instruction is 
read from the memory via the memory data bus and 
written into the macro opcode register. The instruction 
can also be written into the register file via the B write port 
in the same cycle (which may be useful for instructions 
that contain immediate operands that would be used by 
the data section). 


Macro Opcode Map RAM: The macro opcode map 
RAM is made of three Am9150 high speed SRAMS. The 
opcode portion of the macro opcode register addresses 
a microcode entry point table in the map RAM. This entry 
point is then used by the Am29331 sequencer as a 
branch address to the microcode routine that performs 
the function required by the macroinstruction. 


Macro Operands: The operand portion of the macro 
opcode register is loaded into the macro operand count- 
ers. The macroinstruction operands allow the direct 
specification of register file addresses, ALU shift values, 
or ALU field masks to be used by the microcode routines. 


Register File Address, Position, and Width 
Multiplexers: Register file addresses are passed to the 
register file via the register file address multiplexer. Po- 
sition and width information for shift values and field 
masks are passed to the ALU via the position and width 
multiplexers. These multiplexers allow either the microc- 
ode or the macroinstructions to control the register file 
and ALU. 
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SECTION 2 
Nomenclature 


Throughout the remaining figures inthis application note, 
some naming and drawing conventions are used as 
noted below. : 


Allsignalnames are written as single word identifiers with - 


underlines used to provide visual space between sec- 
tions of a multi-word identifier. 


Signals that are active low have names that end with an 
asterisk. In some of this document’s programmable logic 
definition files, this convention is not allowed. In those 
situations, the active low signal names will begin with an 
exclamation point or end with an underline character. 


Clock and qualified clock signals have names that begin 
with CLK_. 


Groups of signals that form buses are shown as single 
lines with an associated number that indicates how many 
lines are involved. Bus lines are drawn with 45 degree 
turns and intersections instead of the usual right angle 
turns and intersections used with individual signal lines, 
in order to highlight buses visually. Major data highways 
such as the A_BUS, B_BUS, and Y_BUS have signal 
names thatendin_BUS. The lines of abus are numbered 
from least significant to most significant with the least 
significant identified as line zero (0). Where a subset of 
the lines in a bus is shown, the bus signal name will be 
followed by parentheses containing numbers that show 
the range of lines in use. The numbers of a continuous 
range are separated by a colon (:), non-contiguously 
numbered lines are separated by a comma (,). Where 
lines of a bus are split out to show the specific connection 
of bus lines in a Circuit, a small number that indicates the 
line number within the bus will be shown near each line 
that is split off. 


Four major buses in the system share a common struc- 
ture. The A_BUS, B_BUS, Y_BUS, and MD_BUS all 
have the same layout. Each bus carries a 36-bit data 
word, which is arranged as four 8-bit bytes, each byte 
having its own parity bit. Byte zero (least significant) is 


locatedin bits 0:7; bit 32 is the parity bit for byte zero. Byte 
oneis in bits 8:15 with its parity in bit 33. Byte two is in bits 
16:23 with parity in bit 34. Byte three is in bits 24:31 with 
parity in bit 35. 


Signals that come directly from the microcode memory 
pipeline register have signal names that begin with “P_”. 


Ground symbols (zero volt points) are drawn as down- 
ward pointing triangles, or the signal name GND is used. 


Points tied to +5 volts are labeled with the signal name 
Veco: 

Components are shown with pin numbers immediately 
outside the rectangle that defines the component. 
Component-specific signal names related to component 
pins may be shown immediately inside the component 
rectangle. Where there are several components shown 
on a page with very similar connections, only one of the 
components will have pin numbers and signal names 
shown. The remaining components on the page are 
wired in the same manner. 


Each component is assigned and labeled with a “U 
number” that uniquely identifies the component. This 
helps identify specific components for discussion and 
separates identical type devices in the system compo- 
nent list. - 


Because this demonstration system is complex by na- 
ture, it must be illustrated with many figures, each focus- 
ing on a different portion of the overall system. In order to 
show the signal interconnections between all parts of the 
system, each signal that leaves or enters a figure is given 
aname. Often the names are abbreviations in order to 
save space in the figures. Each name shows a relation- 
ship to the signal’s use. Wherever the same signal name 
appears in different figures, a connection between the 
figures is defined. To help in identifying all the figures to 
which a signal travels, there is a signal-to-figure cross 
reference listing in Appendix 8B. 
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SECTION 3 
Data Section Description 


REGISTER FILE 


Two Am29334 register files are used in tandem to pro- 
vide a 64-register by 36-bit wide file. This allows the 
storage of 32-bit data plus parity (1 parity bit/byte). Each 
Am29334 contains 64 registers that are 18 bits wide; see 
Figure 3-1. 


An Am29334 register file can both read and write data in 
the same cycle, but it does not performthe read and write 
simultaneously. The read must be performed during part 
of the system cycle and the write during another part of 
the cycle. Since read data is needed by the ALU and 
multipliers as early in the cycle as possible and, since 
data values to be written are only available later in the 
cycle, the reading of data is done in the first half of the 
cycle and the writing done in the second half of the cycle. 
A convenient way to separate the two parts of the cycle 
is to use the system clock signal to control the internal 
address mux and write enable. 


Asconnectedin Figure 3-1,the read port latch enables 
(LEA and LEB) and write port common enables (WEAC* 
and WEBC”*) are tied to the data section clock line 
(CLK_D). This causes read data to be accessed while 
CLK_D is high and read data to be latched when CLK_D 
is low. Data is written when CLK_D is lowif the port write 
enables are active (WEAL* and WEAH*, or WEBL* and 
WEBH’). The high and low byte write enables for each 
port are tied together since only full 36-bit word writes will 
be done in this system. 


The various read and write addresses are provided from 
the register file address multiplexers, which will be cov- 
ered later. 


The output enable (P_OEA*) and write enables 
(P_WEA* and P_WEB*) come directly from the microc- 
ode pipeline register. 


ARITHMETIC LOGIC UNIT 


Am29332 


The Am29332 provides a 64-bit funnel (barrel) shifter, 
32-bit mask generator, and 32-bit ALU. The ALU can 
perform binary and BCD add or subtract, multi-cycle 
multiply or divide, and logical operations. This single, 
highly-integrated chip provides the complete function of 
the ALU block inthis system. The only added component 
is an external register used to maintain status bits forthe 
macroprogram separate from status information used by 
the micro program. The ALU is shown in Figure 3-2. 


Most of the control lines come directly from the microc- 
ode control pipeline register. 


The ALU output enable (ALU_OE") is decoded from the 


control pipeline register. 


The POSITION and WIDTH signals come from the posi- 
tion and width multiplexers. These multiplexers select 
the position and width values from either the microcode 
pipeline or the macroinstruction in the macro opcode 
register. 


The slave mode input is tied to ground since there will be 
no use of the slave mode comparisons in this system. 


The HOLD input is used as an enable control over the 
clocking of the internal micro status register and Q 
register during times the ALU is not in use. Because the 
ALU, FPP, and PM share the same data source and 
destination buses (A_BUS, B_BUS, and Y_BUS), they 
generally cannot be used simultaneously due to bus 
contention. In recognition of this, the control fields for the 
ALU, FPP, and PM have been overlapped in the microc- 
ode to minimize the required width of each microcode 
word. This means that at certain times the control lines to 
the ALU will be meaningless to the ALU because the 
values on the lines are determined by the needs of the 
FPP or PM. Therefore, unless the hold input is used to 
prevent clocking of the status and Q register duing these 
times, the ALU status could be lost whenever the FPP or 
PM are in use. 


Note, however, that the hold input is not used as the 
general means to prevent clocking of the ALU registers 
when the whole system is halted (e.g., during single step 
mode). The data clock (CLK_D) that is distributed 
throughout the data section of the CPU is a qualified clock 
and willbe usedto control the state change of all registers 
in the data section, including those in the ALU at times 
when the whole system is halted. 


Macro Status Register 


There are two levels of status information that the pro- 
grammer of a microprogrammed system musttrack if that 
system executes macroinstructions. These are referred 
to as the micro and macro status. The micro status of the 
system is updated at the end of each microcode step and 
is part of the system state. The macro status is part ofthe 
macroprogram state as reflected at the end of each 
macro step. Since many microinstructions may be exe- 
cuted to perform the function defined by a given macro- 
instruction, the macro status reflects the machine state 
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from the macroprogram viewpoint. The macro status 
may be carried across many microinstruction cycles 
without change. This requires a separate register to 
containthe macro status independent ofthe micro status. 
The Am29332 does not have an internal macro status 
register so one must be provided externally. The loading 
of the macro status register and the use of the macro 
status information by the microprogram must be con- 
trolled by microcode. The Am29332 does provide an on- 
board multiplexer to select between the micro and 
macro status inputs. Only the carry and link values are 
used directly by the Am29332 since these are the only 
status values normally used to modify data values. The 
macro stat us for the zero, sign, and overflow flags can 
be used by the sequencer as test conditions for branch 
instructions. 


The register used for holding macro status is an 
Am29818-1. The register is loaded (clocked) by a quali- 
fied clock called CLK_MAC_ STAT. This clock is qualified 
by the load macro status bit in the control pipeline 
register. The Am29818-1 is also used to provide a 
diagnostic ability to read and load the macro status 
register through the use of an internal serial shadow 
register (SSR). 


FLOATING POINT PROCESSOR 


Am29325 


The Am29325 Floating Point Processor (FPP) performs 
32-bit floating point multiplication, addition, or subtrac- 
tion in a single cycle. Floating point division can be done 
in sevencycles using the Newton-Raphson method. The 
FPP is shown in Figure 3-3. 


All the control lines for the FPP are driven directly by the 
microcode pipeline register with the exception of the FPP 
output enable and the register flow-through enables. 
Those signals are decoded from the data path select field 
of the microcode pipeline register. The output enable 
decode is done by the AMPAL22V10inFigure 3-3.The 
register flow through enable decode is done by the 
control decode logic which is described later. 


It should be noted that the Am29325 is not a full fledged 
member of the Am29300 family. It is different from the 
other Am29300 members with regard to three key char- 
acteristics: it is slower, does no data bus parity checking 
or generation, and has no slave mode capability. 


The Am29325 flow through calculation time is 100 to 
125 nsratherthanthe 42o0r70 ns forthe ALU or PM 
(the current PM is at120 ns, but the fastest version will 
be at 70 ns). This requires that whenever the FPP is 
used, the system clock cycle must be extended to allow 
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for the slower propagation time. This extended clock 
timing is covered later in more detail. 


The lack of parity checking is not much of a problem for 
the rest of the system since it only affects the data 
integrity of information going through the FPP. The lack 
of parity generation isn’t a problem as long as only the 
FPP is working on the data. The problem starts when 


floating point data is moved back to memory or is con- 


verted to integer values for use by the ALU. 


lf data from the FPP is read by the ALU or PM, parity 
errors will be detected and a system interrupt may 
result. That problem can be avoided if the system has 
kept track of which data resulted from FPP calculations 
and if the parity errors are ignored when that data is 
read. But if FPP data results are moved directly to the 
memory and then on to the host system, the parity errors 
will eventually be found. 


So some means of adding parity generation to the FPP 
should be provided. One way is to add four 8-bit parity 
generator chips to the FPP output bus. This consumes 
power and boardspace while providing a benefit only 
when FPP data is moved directly through the register file 
to the memory. A better way is to use the parity genera- 
tors already available in the Am29332 by requiring that 
FPP data be passed through the ALU before being 
moved to the memory. Even though the data may not be 
modified by the ALU, correct parity will be generated on 
the ALU output. 


With the use of alittle trick, there is a way to provide parity 
checking on the FPP data inputs. To do this, one of the 
data path select codes is used to control the output 

enables of both the ALU and FPP. This code (P_DSP = 

11) causes the FPP outputs to be disabled and the ALU 

outputs enabled, even though the data path selected is 

the FPP. By turning on the ALU outputs, the ALU parity 
error output will also be enabled and any parity error on 

the A_BUS or B_BUS will be reported. At the same time, . 
the control microcode for the FPP is still valid and may be 

used to load registers with the data present on the 

A_BUS and B_BUS. Ofcourse the register file should not 

be loaded from the Y_BUS in the cycle where this 

scheme is used because the ALU is driving nonsense 

information onto the Y_BUS. Enabling the ALU outputs 

is only atrick used to make the ALU parity checker results 

available for this scheme. Note that the ALU hold input 

remains active even though the ALU output enable is 

active. This prevents any state change in the ALU when 

the FPP is the data path actually in use. 


Finally, the issue of no slave error checking is unimpor- 
tant, since the slave mode is not used in this system. 
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FPP External Status Register 


Status Pipeline Issue 


The FPP status flags appear at the status outputs along 
with data at the Y outputs. If the FPP “F” register is made 
transparent, the status flag register is also transparent. If 
the F register is clocked, so is the status register. In this 
demonstration system this presents a problem. 


Normally, status conditions from the data section are 
registered before being used by the control section. This 
maintains the pipelined, parallel operation of the control 
and data sections. The control section bases its testing 
on registered status from the last data section cycle 
rather than being forced to wait for status results of the 
current execution cycle before determining the next 
microinstruction to execute. 


To provide the same system for the FPP requires an 
external status register for cycles in which the F register 
is transparent to allow results to pass directly to the 
register file. In that situation the status flags are not 
registered by the FPP and thus, without an external 
register, there is no place to pipeline the status for the 
control section. 


Multiple Status Flag Test Issue 


Several of the FPP status flags signal events of equal 
importance such that it would be a convenience to be 
able to test multiple flags in a single cycle rather than 
basing branches on only one flag at a time. 


A simple way to test multiple conditions at one time is to 
execute a multi-way branch based on the bits being 
tested. In the case of the FPP there are six flags, too 
many for a single multi-way branch which can be based 
on only four bits. A solution is to OR some of the flags 
together as one of the multi-way branch bits and use the 
remaining bits directly as part of the multi-way branch 
address. In that way, one multi-way branch can test all 
six flags. 


When testing the status, if no flags are active, no abnor- 
mal condition exists, and the zero value destination of the 
multi-way branch continues. If one or more of the direct 
flags is active, the multi-way branch goes straight to a 
routine to handle the problem. If one of the ORed flags is 
active, the multi-way branch destination instruction can 
either ignore the flags or take a Second multi-way branch 
that is based on direct inputs of the flags that were ORed 
in the first multi-way branch (an advantage of having 
more than one source for multi-way branch conditions). 
The second multi-way branch determines which of the 
ORed flags was active in the first multi-way branch. 
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FPP Status Register implementation 


An AmPAL22V10 Programmable Array Logic device is 
used to register the FPP status flags and perform the OR 
of some of the flags. 


This external status register loads new status only as the 
result of cycles in which the FPP is the selected data path 
during an instruction execution. When the FPP “F” regis- 
ter is in transparent mode, the external status register is 
loaded with the flags at the end of an FPP cycle. This 
results in a one level deep pipeline on status in the same 
way that ALU status is pipelined one level internal to the 
ALU.When the F register is inclocked mode, the external 
status register will load in the cycle following an FPP 
cycle. This will capture the data that is loaded into the 
FPP on chip status register at the end of the FPP cycle. 
This causes the status to be double pipelined for cycles 
in which the F register is clocked. 


The multi-way branch outputs forthe first level branch are 
the following flags: Overflow, Underflow, Invalid , and the 
OR of the Inexact, OR, NAN, and Zero flags. The multi- 
way branch outputs for the second level branch are: 
Inexact, NAN, Zero, and Ground. 


These groups of four bits are substituted for the least 
significant four bits of a branch address to act as a multi- 
way branch. 


In addition to the multi-way branch test for flags, an added 
output of the status PAL ORs together the Overflow, 
Underflow, and Invalid flags for use as an interrupt signal 
to the system interrupt controller, thus giving one addi- 
tional way to monitor the FPP error flags. Using the 
interrupt approach eliminates the need to follow floating 
point operations with multi-way branches in order to test 
for érror conditions. Execution of instructions can pro- 
ceed, assuming no major problems exist inan FPP cycle. 
lf one of the above mentioned error flags is active, the 
resulting interrupt will deal with the error. 


One last element of the status PAL is that it acts as part 
of the system control decode by decoding the data path 
select bits of the control pipeline to enable the FPP output 
when the FPP is the selected data path. 


The logic definition file for the status PAL is listed in 
Appendix C. 


Seed Look-Up Table 


The Newton-Raphson division algorithm does a division 
of A by B by finding the inverse of B (i.e., 1/B) and 


performing a multiply against A. This scheme works with 
the Am29325 since finding the inverse of B requires only 









a series of multiplies and subtracts which the Am29325 
can do in single cycles. But, these multiplies and sub- 
tracts are performed only to refine the accuracy of a 
precalculated seed value (a rough approximation of the 
inverse of B). So atable of seed values must be available 
to support division with the Am29325. 


This seed table is stored in PROM memory externalto the 
FPP. The B variable is used to address the seed table, 
and the resulting seed value is fed into the FPP to be 
refined. 


Placing the seed table inthe path to one ofthe FPP inputs 
normally requires a 32-bit multiplexer to select between 
the PROM and the direct input bus for loading normal 
operands in multiply, add, and subtract operations. Build- 
ing this multiplexer would require at least six hex-2-to-1 
multiplexer chips. The PROM and multiplexer would also 
increase the propagation time needed to load the FPP, 
thereby requiring the cycle timing to be extended even 
more than is already required by the FPP. 
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The implementation of the seed table in this system has 
been modified to save chips and cycle length. Instead of I 
placing the seed table between the A_BUS andthe FPP, | : 
it is placed to the side as an appendage of the A_BUS 
(see Figure 3-3). Theinputs andoutputs of the table are 
tied together and to the A_BUS. The internal structure of 
the table is shown in Figure 3-4. It contains three 
PROMs, each of which is followed by a three-state output 
register (the Am27S25 has an internal register). In this 
arrangement the PROMS can be accessed by the value 
presentonthe A_BUS inonecycle andthe resulting seed 
loaded into the registers. In the following cycle the 
registers can drive the A_BUS with the seed value. This r 
scheme requires three fewer chips and no extension to 
the FPP cycle time. It is true that two cycles are now 
required to load the seed value but the cycle used to 
access the seed table can be combined with the 
operation of checking for a zero divisor. This operation is 
generally done during the setup for a divide. 
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Figure 3-4. Floating Point Block Seed Look-Up Table -- Data Flow Diagram 
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The detailed connections of the seed table are shown in 
Figure 3-5. The Am27S25 contains the seed values for 
the exponent and the two Am27S43s contain the seed for 
the fraction. The seed table output enable (SEED _OE*) 
signal is a decoded output of the microcode control 
pipeline register. The output register of the seed look-up 
table is clocked by the data section clock. 


PARALLEL MULTIPLIER 


The entire Parallel Multiplier (PM) block’s function is 
provided by the single chip Am29C323 Parallel Multi- 
plier. This chip performs 32-bit, 64-bit, 96-bit, and 128-bit 
integer multiplies. It also can perform multiply accumu- 
late using an internal 67-bit accumulator. The PM is 
shown in Figure 3-6. 
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CLK_D LU> 


GND L_> 


Most of the control signals come directly from the control 
pipeline register. The Parallel Multiplier output enable 
(PM_OE*) is decoded from the data path select field of 
the microcode pipeline register. The enable and flow 
through controls for the instruction register (ENI* and 
FTl) are tied respectively to GND and VCC to allow 
instructions to flow directly from the microcode pipeline 
register to the multiplier, since the microcode pipeline 
register already provides the one level of pipeline re- 
quired in the system. The flow through enable on the 
product register is enabled only when the PM data path 
is selected via the control decode logic. 
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Figure 3-5. Floating Point Block Seed Look-Up Table -- Implementation 
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Figure 3-6. Integer Multiplier Block 








SOJON UONPONddy/saoipy 


9 YALdVHO 











CHAPTER 6 
Articles/Application Note 





SECTION 4 
Memory and External System Interface 


The memory block and external system interface are 
discussed together in this chapter because of the tight 
interconnection between these areas. It is helpful to view 
the two blocks together in order to understand the shared 
use these blocks make of the memory address bus 
(MA_BUS) and the memory data bus (MD_BUS). Fig- 
ure 4-1 shows abiock diagram of the data and address 
paths used in these sections. 


One thing to note is that both the memory and the 
external interface are not elaborate in design. Essentially 
the external I/O section of this system is just a second 
port on the system memory. This system does little more 
than provide a simple arbitration scheme on access to 
the memory that allows an externally supplied DMA 
device to load and retrieve data from the memory. Event 
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or interrupt signaling between the CPU and host system 
is limited to a single pair of interrupt signals, one from host 
to CPU, one from CPU to host. Memory itself is only a 
simple bank of static RAM with two address counters on 
the input that help speed up array calculation. 


The reason for this simple approach is that the design to 
the CPU using the Am29300 family of building blocks is 
the focus of this application note. Every reader who may 
find the information in this application note useful will 
have different memory and I/O requirements to handle 
and will very likely design individual approachs to mem- 
ory and |/O. Therefore, only this simple approach is 
covered here so that more time can be spent discussing 
the CPU design. 
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Figure 4-1. Memory and External Interface Address and Data Paths 


6-31 











CHAPTER 6 
Articles/Application Notes 





EXTERNAL BUS INTERFACE CONTROL 


Host Access Definition 


A block diagram of the host interface controller and its 
connection to the MA_BUS and MD_BUS buffers is 
shown in Figure 4-2. 


The Am29300 demonstration system is treated as a co- 
processor to some host system. It ultimately gets all of its 
instructions, data, and control from the external host 
system. To provide communication with the host using a 
minimum of design effort and special hardware, only two 
portals into the Am29300 system are allowed. 


One portal is the Am29300 memory, which is treated as 
a dual port memory with all words directiy mapped into 
the host bus address space. With this, the host has 
complete access to macroinstructions and data going 
into and out of the system. | 


The second port is a serial diagnostics shift chain that 
runs through key control registers of the system. This 
serial pathway gives access to loading and reading the 
microcode writable control store, to the control pipeline 
register, to loading and reading the macro opcode map 
RAM, to the macro opcode register, to the macro status 
register, and to the interrupt base address register. 





Through this serial port, the microinstructions are loaded 
by the host before program execution begins. Also, the 
system clocks can be controlled by the host to allow 
diagnostics and code debugging via single stepping and 
breakpoints. 


These portals are controlled by a state machine that is 
separate from the Am29300 system. The state machine 
is referred to as the host interface controller. It constantly 


~ monitors the external host address bus. When the host 


presents an address that matches a preset address on 
the Am29300 system board, the host interface controller 
is selected to perform one of several interface functions. 


Any function requested by the host takes priority over 
anything that the Am29300 CPU is doing. The host 
always gains control of the memory address and data 
buses as soon as the CPU clocks can be stopped and the 
CPU to memory bus buffers disabled. 


The function performed is dependent on the address 
used, thus the commands from the host to the interface 
controller are memory mapped. A 24-bit address fromthe 
host is assumed for this design. The 6 most significant 
bits (23:18) of the address are matched to the Am29300 
system board address to select the host interface control- 
ler. The next two most significant bits (17:16) are used to 
select acommand mode. The 3 least significant bits (2:0) 
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Figure 4-2. Host Interface Block Diagram 
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are usedto select a specific command function within two 
of the command modes. 


Host Interface Block Diagram 


The 6 most significant bits of the host address are 
checked by the address recognition block: if the address 
matches the board address, then the match signal is fed 
into the input of a synchronizing register. Also fed into this 
register are: the external bus write enable line 
(EXT_WEN*); the external address bits 17, 16, 2:0 
[EXT_ADD(17,16,2:0)]; and the host system reset line. 


The synchronizing register is clocked by a free-running 
version of the Am29300 system clock. The register used 
has special meta-stable hardened circuitry that prevents 
the outputs from oscillating, regardless of the timing 
relationship of input data to clock. This register allows the 
entire Am29300 system to run asynchronously with 
regard to the host system clock. All the interaction be- 
tween the host system and the Am29300 system is 
synchronized to the Am29300 system clock by the regis- 
ter. Each command to the host interface controller is thus 
presented at the output of this register in synchronization 
with the host interface controller clock. 
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The heart of the host interface is an Am29PL141 Fuse 
Programmable Controller. It is a microprogrammed 
sequencer with on-chip microcode memory and pipeline 
register. This sequencer implements the state machine 
functions needed to control the interaction between the 
host and the Am29300 system. Used with the 
Am29PL141 is an Am22V10 PAL. This PAL collects 
together some glue logic functions: an interrupt signal 
latch, a multiplexer, and some encoding logic, allof which 
are described later. 


The Am29PL141 provides control signals to the clock 
gating and distribution section of the Am29300 system. It 
also controls the enabling of all the buffers and transceiv- 
ers that connect with the MA_BUS and MD_BUS. The 
controller acts as a “traffic cop” that allows only one driver 
on those buses at a time to prevent contention. The 
controller also manages the loading, reading, and shift- 
ing of the Serial Shadow Register diagnostic chain. 


The Serial Shadow Register (SSR) diagnostics port is a 
32-bit-wide parallel read and write register that also 
functions as a shift register. Data to be read or written to 
the SSR diagnostic chain is loaded or read via this port. 
The port is connected to the host via the MD_BUS. The 
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Figure 4-3. Host Interface Controller 
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portis built from four Am29818-1 SSR diagnostic pipeline 
registers. These registers, like all the registers in the 
diagnostics chain in this system, contain one normal 
parallel input and output pipeline register that is backed- 
up or “Shadowed” by a second parailel input and output 
register that also acts as a serial shift register. The 
pipeline register can be loaded from the shadow register 
and the shadow register can be loaded from the outputs 
of the pipeline register. This gives the ability to move data 
into or out of the pipeline register via the shadow register. 
Data in the shadow register can be serially shifted to 
other similar registers inthe system. By connecting allthe 
diagnostic serial shadow registers together in a serial 
chain, data canbe moved serially through a large number 
of key registers in the system using very few wires. 


The SSR diagnostics port is just an extra section of the 
diagnostics chain that runs throughout the Am29300 
system. This extra section is connected to the MD_BUS 
to serve as a parallel input and output port that gives 
access to the serial shadow register chain. 
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A slightly more detailed view of the Host Interface Con- 
troller is shown in Figures 4-3 and 4-4. | 


Event Signals 


The host and the Am29300 system need to be abie to 
signal each other when important events occur, such as 
the transfer of ownership over sections of the dual port 
memory. To allow this, a simple interrupt setting and 
clearing scheme is provided. 


The host interrupts the Am29300 system with a com- 
mand to the host interface controller. The controller in 
turn sets an interrupt flag in the Am29300 system inter- 
rupt controller. The interrupt is cleared when the 
Am29300 services its interrupt controller. 


The Am29300 interrupts the host by using a microcode 
bit to set alatch that drives an interrupt line onthe external 
bus. The interrupt is cleared whenever the host does an 
operation on the SSR port. The interrupt latch is imple- 
mented in the AMPAL22V10, as shown in Figure 4-4. 


Pre i EXT_INTR 
>, 


| SYS_MEM_EN * 


SD20 


O9856A 4-4 


Figure 4-4, U17 Am22V10A Host Interface Glue Logic 
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Memory Enable 


The Am29300 system memory can be enabled by 
either the Am29300 microcode or by the host interface 
controller. A simple multiplexer is needed to direct the 
correct control signal to the memory enable input. This 
logic is also implemented in the AMPAL22V10 shown 
in Figure 4-4. 


AMPAL22V10 Support Logic 


Figure 4-4 shows the logic for the AmPAL22V10 that 
integrates the interrupt signal latch, SDI multiplexer, and 
memory enable logic. The logic equation definition file for 
this PAL is listed in Appendix _ D. 


SSR Diagnostics 
SSR Shift Path 
Figure 4-5 shows a block diagram of how the serial 


shadow registers in the system are linked together and 
how they relate to the macro opcode map RAM, se- 


quencer, and microcode control store. Most of these 
registers are also depicted in other Figures throughout 
this application note in their réles as parallel input and 
output pipeline registers. Figure 4-5 emphasizes the 
serial in and out and control connections of the shadow 
registers also contained in these registers. 


The SSR diagnostics port is shown as the starting and 
ending point for the entire shift chain (or loop as seen 
here). Data to be loaded into the SSR loop is parallel 
loaded into this register fromthe MD_BUS via the bidirec- 
tional outputs of the registers in this port (note: the 
shadow register in the Am29818-1 gets its input from the 
output pins of the Am29818-1 pipeline register). 


Data loaded into this shadow register is then shifted into 
one of two branches of the SSR loop. One branch flows 
through the Writable Control Store (WCS) port and the 
microcode control store pipeline shadow registers. The 
WCS port is used to address the microcode control store 
or to receive (load) data from (to) the macro opcode map 
RAM. The microcode control store shadow register is 
used to write data into the microcode writable control 
store or to read the contents of the control pipeline 
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Figure 4-5. Serial Diagnostics Shift Path 
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register. The second branch flows through the macro 
opcode, macro status, and the interrupt base address 
registers. The macro opcode register is used in part to 
address the macro opcode map RAM . 


These branches are separate because it helps to shorten 
the shift chain length by using branches and because the 
shift chain clock to the writable control store and WCS 
port must be separate from the shift clocks to the rest of 
the diagnostics chain. The shift clocks must be separate 
because of the way the writable control store is loaded. 


The data outputs of the control store are connected to the 
inputs of the pipeline register as required for normal use 
in the system. To write the memory, the inputs must be 
driven with the data to be written, turning the input pins 
into outputs. In the Writable Control Store (WCS) pipeline 
register this is fine, since the memory outputs are dis- 
abled during the write. 


If other diagnostic registers in the system were tied to the 
same shift clock and mode control lines as the WCS 
pipeline, there could be a problem every time the WCS is 
written. The other diagnostic registers not involved inthe 
WCS write would see the same control signals as the 
WCS registers and would drive their input pins. Depend- 
ing on what the other registers were connected to, this 
situation could cause serious contention problems 
through the system. 


For this reason, the SSR used to load WCS is treated 
separately from other SSR registers in the system. It is 
worth noting that the only control signal that need be 
separate is the shift clock. The mode and serial path may 
be shared with all SSR in the system. Putting the SSR 
into WCS loading mode, requires the shift clock to load an 
internal mode flip flop. If the shift clock is active only to the 
SSR used for WCS when the MODE and Serial Data In 
(SDI) signals are set high, only the WCS SSR will go into 
the input pin driving mode. 


The end of each branch in the SSR loop returns to a 
multiplexer at the serial data input (SDI) of the SSR 
diagnostics port. This multiplexer allows the selection of 
the shifted branch into the port when the SSR loop is 
being read rather than written. It also allows the SDI value 
to be forced when the MODE signal is high. When the 
MODE signal is high, all the SSRs in the system pass 
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their SDI directly to their Serial Data Output (SDO). This 
causes the SDI value forced at the input of the SSR port 
to be passed directly to all SSRs in the system (note: 
significant propagation time from SDI to SDO for each 
SSR is involved). In this way the forced value of SDI 
becomes an additional control signal to all the SSRs in 
the system. The function of this multiplexer is integrated 
into the AMPAL22V10 as shownin Figure 4-4. 


SSR Reading and Writing 


To read the contents of the pipeline registers in the 
Am29300 system, the host must first send a command to 
load the SSR throughout the system from the pipeline 
registers. Then the host must shift the contents of the 
SSR into the SSR port register (up to 32 bits at a time). 
The host then performs a read of the SSR port. The host 
then repeats the shifting-and-reading process until the 
entire SSR chain has been read. 


To write the system pipeline registers, the host reverses 
the above procedure. Data is first written into the SSR 
port. Then the SSR chain is shifted to move data into 
position. The SSR port loading and SSR chain shifting go 
on until the section of the SSR chain desired is filled. 
Finally a pipeline load command is issued by the host to 
load the contents of the SSR into the pipeline registers. 


To write the macro opcode map RAM and the microcode 
writable control store (note: these are treated as a single 
WCS and must be written together), an address for the 
map RAM is first loaded into the macro opcode pipeline 
register via the method described above. Then the ad- 
dress for the microcode WCS is loaded into the WCS port 
pipeline register. Next, the data to be written into the map 
RAM and into the microcode WCS is shifted into the WCS 
port SSR and WCS SSR. A load WCS command is then 
given which performs the actual write of data into the 
memories. During the write operation the output of the 
WCS port is enabled and the Am29331 sequencer output 
is disabled (via its HOLD pin). 


The only trick involved in the SSR Reading and Writing is 
knowing how much to shift the SSR during each read or 
write. The problem is that the SSR chain length in this 
system (and in nearly every real system) is not an even 
multiple of the SSR port size. During the first (or last) shift 
operation of either the read or the write of pipeline 
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registers, it willbe necessary to shift fewer than the full 32 
_ bits of the SSR port. The number of bits to be shifted 
depends onthe chain length. One thing to note is that the 
chain length will be in a multiple of 4 bits because 
- diagnostic pipeline registers are currently available only 
in 4-bit and 8-bit devices. So, when a shift operation is 
commanded by the host, the number of nibbles (4-bit 
shifts) to be shifted must be indicated. 


A final note: during the shifting of the WCS SSR, the 
Am29300 system clocks must be halted. This is due to 
the fact that pipeline clock and shift clock to the Am9151 
may not occur within 65 ns of each other. Since these 
clocks would occur within the above window in this 
system, the pipeline clock must not be active. 


Controller Description 
Function/Command Descriptions 
The following is a list of the address values for functions 


that the host interface will perform when addressed by 
the host: 


ADDRESS BITS 


17 146 2 1 90 
0 O -. xX xX 
O 14 x x xX 
1 0 O QO QO 
+ “Ox Oy - Oe 
1 0) O £ Q 
1 0 O 1 1 
1 OO 1 0 0 
1 O 1 0 1 
1 O 1 1 0 
1 O f 1 1 
1 1 0 O O 
1 1 0 0 1. 
1 14 0 4 = 9 
1 14 0 1 1 
1 4 1 0 0 
1 1 1 4 
1 1 1 1 0 
1 1 71 1 1 


Memory Access: Reading and writing of the Am29300 
system memory is done by selecting the address for the 
Am29300 system with address bits 16 and 17 equal to 
zero. The address for the specific word in memory is 
contained in address bits 0:15. The host interface con- 
troller, upon recognizing the host access, will stop the 
clocks to the Am29300 system and disable the CPU to 
MA _BUS and MD_BUS buffers. At the same time the 
external bus to MA_BUS and MD_BUS transceivers are 
enabled. This suspends the operation of the Am29300 
system and gives memory access to the external host. 
The write enable line on the external bus determines 
whether a read or write occurs. 


Note that by suspending the Am29300 system operation, 
the memory access is transparent to (or hidden from) the 
CPU. There is no action required on the part of the 
Am29300 microcode or interrupt control. 


Serial Diagnostics Port Access: This access is very 
similar to that of amemory access. The difference is that 
the SSR port register is being read or written instead of 
memory. 


FUNCTION 


Am29300 Memory Access 
Serial Diagnostics Port Access 
Illegal code 

Halt CPU 

Run CPU 

Single Step CPU 

Single Step CPU Control Section 
Single Step CPU Data Section 
Interrupt CPU 

Reset CPU 

Illegal code 

Load Pipeline Register 

Load Macro Opcode Register 
Load Writable Control Store 
Load Initialization Register 
Load Serial Shadow Register 
Shift WCS SSR Chain 

Shift Macro Opcode SSR chain 
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Halt CPU: This command throws the Am29300 system 
clocks in to a continuous stop condition unti! the mode is 
cleared by the RUN CPU command or temporarily over- 
riden by one of the single step commands. 


Run CPU: This command starts the Am29300 system 
clocks running. 


Single Step CPU: When the CPU is halted, this com- 
mand will cause all the system clocks to cycle once to 
advance the state of the CPU one step. Note that gated 
clocks will be active during this cycle only if their enables 
are active (i.e., gated clocks operate as they would during 
a normal clock cycle; they are not forced to operate). 


This mode is useful during diagnostic operations to single 
step the machine between serial load and unload of the 
SSR diagnostics. 


Single Step CPU Control Section: This will step only 
the clocks in the control section of the CPU. The control 
pipeline, macro opcode, macro operand, status, se- 
quencer, and interrupt registers may be affected. 


This is useful for forcing the control section into a new 
state under the control of diagnostics, such as a forced 
branch to a new location in the microcode. This is done 
by first loading the control pipeline with an instruction to 
branch via the SSR diagnostics chain. The control sec- 
tion would then be single stepped to execute the branch. 
' Note that during these operations, the data section is not 
affected and no data is modified. 


Single Step CPU Data Section: This operation single 
steps the clocks only inthe data section of the CPU. This 
may be useful for repetitive diagnostic operations involv- 
ing only the data section. 


Interrupt CPU: This command causes the host interface 
controller to set an interrupt input to the Am29300 system 
interrupt controller. The interrupt controller in turn priori- 
tizes the interrupt and causes an interrupt to the CPU 
when that type of interrupt is enabled. 


Reset CPU: This will make the reset line to the Am29300 
system active and step all the ungated system clocks. 
The clocking is required by some parts of the system to 
affect reset state changes. 


Load Pipeline Register: This command will step only 
the clock to the control pipeline and WCS port for one 
cycle while forcing the pipeline registers to load data from 
the SSR chain. This is used to control the state of the 
pipeline through serial diagnostics. 


Load Macro Opcode Register: This steps only the clock 
to the macro opcode, macro operand, status, and inter- 
rupt base address pipeline registers while forcing the 
registers to load from the SSR chain. 


Load Writable Control Store: This command initiates a 
series of clock cycles that cause data in the SSR chain to 
be loaded into the writable microcode control store and 
the macro opcode map RAM from the SSR chain. The 
address loaded is also specified in the SSR chain. 


Load Initialization Register: Like the previous com- 
mand, this operation loads the writable microcode store. 
The difference is that only the WCS (Am9151) initialize 
registers are loaded from the SSR chain. 


Load Serial Shadow Register: This causes the con- 
tents of all diagnostic pipeline registers to be copied into 
the related SSR chain elements. This is used to read the 
Am29300 system state into the SSR chain so that it can 
be shifted out to the host. 


Shift WCS SSR Chain: This command shifts the con- 
tents of the SSR port register into the SSR diagnostics 
chain used for the writable contro! store. It also brings the 
bits at the end of the WCS SSR chain into the SSR port 
register . This is the serial read and write operation of the 
WCS SSR chain (or loop). 


Shift Macro Opcode SSR Chain: This is the same as 
the previous command but it affects the SSR chain 
associated with the macro opcode, status, and interrupt 
base address registers. 


lilegal Code: Due to the way the host interface control- 
ler algorithm was implemented, this command (address 
combination) ts illegal. If it is used, it will lock up the host 
interface controller in an infinite loop. 


Access Timing 


The speed of interaction between the host and the 
Am29300 system is regulated by both the host and the 
host interface controller. 


Once the Am29300 system is addressed by the host, the 
host interface controller holds the external bus by driving 
EXT READY inactive. This continues until the host inter- 
face controller completes the command requested. The 
EXT_READY signal is then made active and held active 
until the host stops addressing the Am29300 system. At | 
that time, the host interface controller recognizes that the 
host has completed the transaction and the 
EXT_READY line is again made inactive. 
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In this fashion, either the host interface controller or the 
host can extend the length of the external bus transaction 
as required: The signal timing between the host and the 
host interface is treated as asynchronous. The timing of 
the host interface itself is synchronous with the Am29300 
internal clock cycle. 


An interaction diagram is shown below fora bus transac- 
tion between the host and the Am29300 system. The 
single-line dividers indicate one clock cycle of the 


External Bus Activity 


Address to Am29300 is 
active on the bus. 


Address is clocked into 
the host interface 
controller synchronizing 
register. 


External bus 
transceivers are enabled 
if needed. 


If READY is inactive, 
wait for host interface 

to complete algorithm 
and make READY active. 
CPU operation is still 
suspended. 


External bus address 
no longer selects 
Am29300 system. 


Lack of external bus 
address is clocked into 
host interface sync 
register. 


External bus transceiver 
is disabled. 


Am29300 system. The double-line dividers indicate one 
or more clocks as needed for synchronization or aigo- 
rithm execution. 


The length of an external bus transaction can vary from 
about 6 Am29300 system clock cycles for a memory 
access, to about 80 clock cycles for an SSR shift 
operation. Regardless of the transaction type, the 
Am29300 system looks to the host like a slave bus 
peripheral. Sometimes, as in the case of the SSR shift 
operation, it is a rather slow peripheral. 


Am29300 System Activity 


CPU is active. 
CPU owns MA and MD bus. 


CPU is still active. 

CPU still owns internal bus. 
Host interface controller 
performs branch to command 
routine. 


CPU clocks are stopped. 

CPU bus buffers are disabled. 
Host interface executes first 
instruction of command routine. 
READY may or may not be made 
active depending on routine. 


if READY is active, then 
wait for host to 

release external bus by 
stopping selection of 
the Am29300 system. 


CPU still suspended. 
Host interface waiting to 
see host release bus. 


CPU still suspended. 
Host interface branches back 
to idle loop. 


CPU clocks are active. 
CPU has MA and MD bus access. 
Host interface waits in idle loop for next command. 
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Program Definition 


A detailed definition of the host interface controller's 
algorithm is contained in Appendix E. 


MEMORY 
Memory Components 


The memory device used to construct the 16K word x 36- 
bit memory is the Am99C165. This is a 16K x 4-bit CMOS 
static RAM memory. The 35 __ns access time version is 
assumed in any timing estimates for the Am29300 
demonstration system. Nine memories are used as 
shown in Figure 4-6. 


The Am99C165 is used so that an additional output 
enable is available to help prevent bus contention with 
other buffers on the MD_BUS. The memory outputs are 
disabled whenever the memory write enable line is 
active. The write enable line is also used to control the 
direction of the external bus data transceiver and the 
~ enable onthe CPU data buffer. The delay of the inverter 
on the output enable input to the memory has been 
matched by a buffer in each of the other bus drivers just 
noted. This is so that when a write operation is signalled, 
each bus driver receives its bus enable or disable signal 
at the same time as the memory. This overlaps the turn 
off time of the memory outputs with the turn on time of the 
other bus drivers to minimize bus contention with the 
memory. 


The enable line to the memory is used to power down the 
memory when it is not being selected by the Am29300 
CPU. 


The write enable line to the memory is gated with the 
Am29300 system free-running clock. This keeps the 
write line high (inactive) until late in the cycle when all 
the control signals that feed into the memory enable 
have settled. This is important for cycles in which there 
is a change of ownership on the memory address and 
data buses. The gating with clock ensures that unin- 
tended pulses on the write enable line that may occur 
early in the system cycle will not cause spurious writes in 
the memory. 


Addressing Scheme 


Description: Withreference to Figure 4-1,the memory 
address bus (MA_BUS) is not only the address input to 
the memory, itis also a part of a4 to 1 multiplexer. There 
are four address drivers tied to the MA_BUS. They are: 
the A_BUSto MA_BUS buffer, the External Bus address 
to MA_BUS buffer, and the two memory address count- 
ers. Each of these sources has three-state output drivers 
and, by careful control of which source is allowed to drive 
the MA_BUS at any one time, the sources form the 4 to 
1 multiplexer. 


In this way the memory can be addressed directly by the 
A_BUS or the External Bus. The memory can also be 
addressed indirectly by the ALBUS via the memory 
address counters. 





CLK_FREE_RUN [> 
MEM_WEN‘* 


SYS_MEM_EN* 





U22 


7 Additional Memories 
To Form a 36-Bit Word 
















U30 
Am99C165 
e 


[>> MD_BUS 
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The memory address counters are loadable up/down 
counters that can serve as address pipeline registers, 
sequencers, or stack pointers independent of the CPU’s 
data section. They allow sequential reads or writes to 
-memory by the CPU without requiring the CPU to calcu- 
late an address on every read or write cycle. 


In fact, after loading a memory address counter with an 
initial address, the CPU can perform sequential read 
cycles while at the same time continuing to use the data 
section for other calculations. This is possible because of 
the dual write port design of the CPU register file. The 
memory data is loaded into the register file via the B write 
port while calculation results on the Y_BUS are stored 
through the A write port. 


Two counters are provided to allow for consecutive A and 
B operand data fetches from two separate arrays of data 
without the need to constantly reload the counter values. 
Each counter is built from two AMPAL22V10 Program- 
mable Array Logic (PAL) devices that act as two cas- 
caded 7-bit loadable up/down counters. The counters 
are connected as showninFigure 4-7. The logic defini- 
tion file for the PALS is given in Appendix F. 


The two counters are only loaded from the A_BUS and 
not the External Bus, even though the connection of the 
counters to the MA_BUS would permit the latter. This is 
due to the difficulty in coordinating the use of the counters 


Oo 123 4 5 6 


COUNTER ALSB 
Am22V10 


34 
COUNTER BLSB 
Am22V10 


Ty] T 





betweenthe CPU andthe External Bus. The counters are 
simply viewed as a resource of the CPU only. 


Why This Approach?: Why address the memory from 
the A_BUS? Doing so means that data in the memory is 
selected by an address previously stored in the register 
file. So one cycle must be used to calculate an address 
in the data section of the CPU, store the result in the 
register file, and take a second cycle to actually address 
the memory. Why not just take the address as it is 
calculated and feed it directly from the Y_BUS to the 
memory? 


First, the access time is better from the A_BUS than from 
the Y_BUS. The A_BUS address is valid 45 ns intoa 
cycle which still leaves time to access a fast static RAM 
in the same time that data would normally flow from the 
A_BUS through the ALU and back to the register file. An 
address on the Y_BUS would not be valid until 87 ns 
into a cycle, which would require either that the memory 
access extend the cycle length significantly or that the 
address be pipelined into amemory address register and 
be used to address the memory in a second cycle. 


Second, since the register file can present two data 
words in one cycle it is possible to address the memory 
and provide write data inthe same cycle; the address and 
data go from the register file to the memory. If the Y_BUS 


_ is used as the path to the memory in a write operation, a 


second cycle must be used to provide the write data. 


7 8 9 10 11 12 13 


COUNTER AMSB 





MA_BUS 


COUNTER B MSB 
Am22V10 


7 8 9 10 11 12 13 
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Third, the above comments are trick answers. If the two 
approaches of A_BUSorY_BUS asthe memory address 
path are carefully examined it can be seen that it is really 
a situation of “six of one, or half a dozen of the other’. 


Ultimately, in either case, a cycle is use to calculate the 
address and a second cycle is used to read or write the 


memory; there is only one data path in the system and 
only one calculation can occur in a cycle. Between the 
two approaches there are various ways to overlap other 


calculations with memory accesses to make the bestuse _ 


of the system’s time but either approach takes the same 
time. 


The real difference is that the ALBUS method is simpler 
from the microprogrammer’s point of view. With the 
A_BUS method a memory read is done in one cycle and 
the resulting data is in the register file in the next cycle. 
With the Y_BUS approach there is a one cycle delay 
between a read access and the return of data, which 
requires that the microprogrammer “fill in the hole” in the 
microcode with other useful work to get the same system 
efficiency. So, as a designer’s preference, the A_BUS for 
memory address approach is used. 


P_MEM_WR* [> 


A_BUS (2:15) [> 





CPU - Memory Buffers 


The address buffers fromthe A_BUStothe MA_BUS and 
the data buffers from the B_BUS to the MD_BUS are 
shownin Figure 4-8. The address and data buffers are 
built from Am29827 10-bit-wide high speed buffers. 


The address bus is 14-bits wide to address 16K words of 
36-bit-wide memory. But these bits are taken from bit 
positions 2:15 of the A_BUS. This leaves the two least 
significant bits of the A_BUS unused and therefore treats 
the address as being in terms of bytes with the address- 
ing restricted to four-byte (word) boundaries. This was 
done so that interface with an external host bus would be 
simpler. Many of the host systems with which this dem- 
onstration system could be mated use byte addressing. 
With the above address scheme, all the address line 
numbering is consistent between the host and CPU. In 
addition, if there were a future need to allow byte ad- 
dressing of the CPU memory, it would be possible with 
only a minor change to the address buffer wiring. Also, it 


| > MEM_WEN* 


[> MA_BUS 
a * : 
CPU_BUS_EN* rs 
| oe nae Pol, ye 
Am29827 Am29827 
18:26 
36 
B BUS 36 
Bus L_ EL * | 5 pseu 


Figure 4-8. CPU to Memory Bus Buffers 
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may be noted thatthe parity bits onthe ALBUS have been 
ignored in the MA_BUS since there is no parity checking 
implemented on the memory address. 


The data buffers are arranged as one buffer per byte of 
the B_ BUS (with parity on each byte). Note that, since the 
B_BUS provides only write data, and read data from the 
memory is received by the register file, only a unidirec- 
tional buffer is needed. 


Whenever the external bus interface does not have the 
memory buses in use, the CPU to memory buffers 
receive the CPU_BUS_EN* signal to enable the buffers. 
lf the operation is a write, the CPU_WEN” signal is 
provided by the CPU. 


Note that the CPU_WEN* is routed through the address 
buffer twice and then to the data buffer to enable it ona 
write operation. This is done to help equalize the timing 
between this buffer and the output enable on the mem- 
ory. Note also that the address buffers have a second 
enable input that is controlled by the control pipeline bits 
that manage whether the memory address comes from 
the A_BUS or from one of the memory address counters. 


EXT_WEN* { > 










| 4m29827 
OE, 
OE. 

EXT_ADD (2:15) 


<1 an ices: 
EXT_BUS_EN* [>4 3 


> 







EXT_DATA L_=> 
t 
1 | 


Am29863 


9:17 
Tog 
23:15 


Figure 4-9. External Bus Buffers 





External System Buffers 


The address buffers from the External Bus to the 
MA_BUS ‘and the data buffers from the External Bus to 
the MD_BUS areshownin Figure 4-9. The address bus 
is built from Am29827 10-bit-wide high speed buffers. 
These buffers are connected in exactly the same way as 
described above forthe CPU to memory address buffers. 


The data buffers are, however, different from the earlier 
circuit description. These buffers are Am29863 non- 
inverting 9-bit high speed transceivers. The transceivers 
allow datato be both read and written by the external bus. 


When the externa! host system addresses the Am29300 
CPU memory, the external bus interface controller halts 
the system clocks in the CPU and disconnects the CPU 
from the MA_BUS and MD_BUS by making 
CPU_BUS_EN* inactive. Then the external bus is con- 
nected to the memory by making EXT_BUS_EN* active 
to enable the external bus buffers. The external bus 
supplies a write enable if the operation will be a write. 
Note again that the write enable timing is equalized with 
that of the write enable to the memory. 


|_ > MEM _WEN* 





[>> MA_BUS 
47K 
; Veo 
|| eas 
| / Am29863 
18:26 
; 36 
al ">> mp_Bus 


l= | a 


a 
- 

Am29863 
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SECTION 5 
Control Section Description 


MACRO OPCODE SUPPORT 


Macro Opcode Register 


In order for the control section of the CPU to make use of 
amacroinstruction, the instruction must be selected from 
memory and loaded into a register that is accessible to 
the control section. 


This register is called the macro opcode register. It is a 
32-bit register made from four Am29818-1 pipeline diag- 
nostic registers. This register is shown in Figure 5-1. 


The most significant 14 bits (bits 31:18) of the register 
output are used as the macro opcode. Bits 31:22 are 
connected to the address inputs of the macro opcode 





MD_BUS (0:31) 


MODE 
DCLK_MOP 
CLK_MOP 
GND 

S$D_I 


Am29818-1 


map RAM. Bits 21:18 are connected to one of the 
Am29331 sequencer’s multi-way branch inputs. These 
lower four bits may thus be used as an opcode modifier 
via a multi-way branch. 


Bits 17:0 are the instruction operand register addresses. 
These bits are divided into three 6-bit fields, one for each 
register file port. Bits 17:12 are used as the register file ‘A’ 
read port address. Bits 11:6 are used as the ‘B’ read port 
address. Bits 5:0 are used as the register file ‘A’ write port 
address. These addresses are respectively referred to 
as the ‘A’, ‘B’, and ‘C’ operand register addresses. 


These three addresses allow macroinstructions to spec- 
ify directly three address operations with two read oper- 
ands and a separate write operand. Note however that 
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Figure 5-1. Macro Opcode Register 
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that these bits are connected to the macro operand 
address counters, which in turn are used to address the 
register file. This is more fully described in a later section. 


In addition, bits 23:18 are connected to the position 
multiplexer. This allows macro instructions to specify 
directly the ALU position input as the lower bits of the 
opcode. Taking the position information from these bits 
still leaves all of the operand register addresses free for 
use in three address operations. 


Also, bits 4:0 are connected to the width multiplexer. This 
allows macro instructions to specify directly the width 
input of the ALU for use in masked operations. Although 
this overrides this field of the opcode for use as the ‘C’ 
operand address, the ‘C’ operand address may inter- 
nally be specified as the same as either the ‘A’ or ‘B’ 
operand register addresses. Thus two address macroin- 
structions involving width, or width and position specifi- 
ers are possible. | 


Macro Opcode Format Restrictions 


Because of the large number of possible macroin- 
struction formats, this application note will not attempt to 
provide a detailed macroinstruction set definition. It is 
only important that the format restrictions imposed by the 
hardware design be stated. 


As defined by connections of the macro opcode register, 
the macro opcode must always be located within bits 
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Figure 5-2. Example Macro Opcode Formats 


31:22. The size and position of the opcode within this field 
are determined by how the macro opcode map RAM is 
set up to interpret and map the opcode. The optional 
opcode modifier (multi-way branch input) must be in bits 
21:18 if it is used. 


The optional position field must be in bits 24:18 if used 
and the optional width field must come from bits 4:0 
when used. 


All three of the operand register addresses are optional 
and if used must come from the fields specified in the last 
section. The operand positions are fixed for the ‘A’ and ‘B’ 
operands since they may only come from the ‘A’ or ‘B’ 
operand bits of the macro opcode register. The ‘C’ 
operand address may come from any of the three 
operand fields. 


The reason that the ‘A’ and ‘B’ operands do not share the 
positional flexibility of the ‘C’ operand is that the ‘A’ and 
‘B‘ operands specify registers to be read fromthe register 
file. These read addresses are in the critical timing path 
for the system, and any excess delay in selecting the 
address adds directly to the system cycle time. A multi- 
plexer like that used for the ‘C’ operand address would 
add undesired cycle lengths. The ‘C’ operand address 
may afford its multiplexer delay since the ‘C’ operand 
address is not used by the register file until late in the 
machine cycle. 
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Each operand address is optional, because the operand 
address may always be specified in the microcode. 


Any optional field, even an unused portion of the opcode 
field, may be used as a data operand. Where afield is not 
used as part of the instruction control, it may be treated 
as data by loading the macroinstruction into the register 
file. Once the instruction is in the data section of the 
system, any data field may be extracted and used in 
calculations. 


some example macroinstruction formats are shown in 
Figure 5-2. The instructions are shown in a 32-bit word 
layout (byte parity is ignored for the moment). 


Macro Opcode Decoding Method 


The opcode portion of the macroinstruction is the index 
into the control store for the location of the first instruction 
of a microcode subroutine. Translating the bit pattern of 
the opcode into the microcode store address may be 
done several ways. 


The opcode could be used directly to point to a table of 
first instructions at the base of the microcode store. In 
such a scheme all microcode routines longer than one 
word would require the first word of the routine to branch 
to the remaining part of the routine elsewhere in the 
microcode store. This would break up many routines into 
different parts of microcode store. It may also be ineffi- 
cient, depending on what other functions the branch field 
of the microcode word could have performed if the first 
word of the routine did not have to be a branch. 


The opcode could be used directly with zeros inserted at 
the least significant end to form an address that would 
point to microcode entry points separated by 2, 4, 8, 16, 
etc. words, depending onthe number of zeros appended. 
This would allow more routines to be located in contigu- 
ous words. Only routines longer than the entry point 
spacing would have to be split by branching to other parts 
of microcode store. The disadvantage is that where 
routines are shorter than the entry point spacing, there 
would be unused holes in the microcode store. When 
microprograms are expanded and the microcode store 
gets full (as memories always seem to do), the micropro- 
grams will be split more and more times to fit into the 
unused holes in the microcode store. This will make the 
micro program more difficult to design and debug as the 
microcode store fills up. 


A PAL may be programmed to decode the opcode into 
entry point addresses spaced to fit the microprograms. 
This allows the microcode words of the routines to be 


kept together in consecutive locations, making design 
and debugging of programs easier. But each time rou- 
tines are moved or expanded in size, a new program for 
the opcode mapping PAL must be defined. 


ARAM or PROM memory may be used as a look-up table 
for entry points in the microcode store. This allows the 
greatest flexibility. Microcode routines may be located 
anywhere in control store, independent of the opcode 
value. The entry poinis may be spaced to fit each routine. 
As routines are changed or moved, it is very easy to 
reload the look-up table with new entry points. 


The opcode mapping method chosen for this system is 
the RAM approach. 


Macro Opcode Map RAM 


The map RAM is shownin Figure 5-3. It is formed from 
three Am9150 1K x 4 bit separate I/O high speed RAMs. 


Together, the three RAMs provide a 12-bit output which 
is used as the microinstruction decode address. The 
address is limited to 12 bits since the maximum size of 
control store provided for in this system is 4K words. 


This decode address is connected to the ‘A‘ address 
input of the Am29331 sequencer. When this address is 
selected by the sequencer, a branch is made to the first 
microinstruction of the selected routine. 


The address input to all the Am9150s comes from the 
most significant bits of the Macro Opcode Register (bits 
31:22). This address selects the entry point into microc- 
ode control store from the map RAM when a macroin- 
struction is decoded. The macro opcode register is also 
used during diagnostics and WCS loading to address the 
map RAM. 


The Am9150 RAMs are always selected and output 
enabled since no other device shares the ‘A’ input of the 
sequencer. Also the Am9151 has no power down mode, 
so there would be no advantage to deselecting the 
memory. Note: if lower power in the system is required, 
an alternate memory to use in implementing the map 
RAM would be the Am2148. That memory does save sig- 
nificant power when deselected and would increase map 
RAM access time only slightly. 


When the Am9150 RAMs are loaded with data, they 
are written with data as though they were an extension 
of the microcode control store. The writable control 
store write enable line is connected to the Am9150’s 
write enable input. 
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WCS Port 


Also shownin Figure 5-3 is the Writable Control Store 
(WCS) port. This port is formed from two Am29818-1 
pipeline diagnostics registers. The port was shown in 
block formin Figure 4-5. The portis used as part of the 
system serial diagnostics and writable control store load- 
ing scheme. 


The bidirectional “inputs” of the Am29818-1 are con- 
nected to the macro opcode map RAM data inputs. When 
placed in a special mode, the port “inputs” are driven as 
data outputs. This data is then used as input to the map 
RAM during a WCS write operation. The data comes 
from the Am29818-1’s internal shadow register. 


The outputs of the WCS port are connected to the 
microcode control store address lines. The WCS port 
may thus be used as an alternate address source for the 
microcode control store. During a diagnostic read or 
write of the control store, the WCS port provides the 
needed address. 


Note that the data for the outputs of the WCS port comes 
from the Am29818-1’s internal pipeline register. The 
pipeline register contents are independent of the shadow 
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Figure 5-3. Macro Opcode Map RAM . 






register contents. This allows an address for the microc- 
ode control store to be inthe pipeline register at the same 
time data for the map RAM is in the shadow register. 
These separate registers allow the WCS and map RAM 
to be written in the same cycle as though en were one 
writable control store. 


Macro Operand Address Counters 


These are three identical loadable up/down binary count- 
ers made from AMPAL22V10 PALs. They are shown in 
Figure 5-4. The logic definition file for the PALs is 
shown in Appendix G. 


One counter is used for each operand register address. 
The counters are loaded from the data outputs of the 
macro opcode register. The outputs of the counters are 
tied to the address inputs of the read and write ports of the 
Am29334 register file. 


The counter load, count direction, output enable, and 
count enable functions are internally decoded from in- 
puts that come from the control pipeline register. These 
counters are intended for use in array processing algo- 
rithms, one example being a digital signal processing 
algorithm for a filter. 
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The counters make it simple to perform the same calcu- 
lation on arrays of data stored in the register file. One 
microinstruction or a short microinstruction routine can 
loop on an array calculation and at the end of each 
calculation cycle simply increment the operand address 
counters. In that way, new operands are fetched for each 
calculation on the array without the need for the microc- 
ode instructions to directly specify operand addresses. 


Control pipeline bits determine whether the microcode 
operand address or the macro operand counter address 
is used. The selection is independent for each operand 
address. Thus, an example would be the operand ‘A’ 
address’ coming from the microcode while the ‘B’ 
operand and ‘C’ operand addresses come from the 
counters. 


An additional feature is that the ‘C’ operand counter 
address may be directed to the Am29334 register file ‘B’ 
write port address input. This allows the ‘C’ operand 
address to come from microcode while the ‘C’ operand 
counter address is used in writing data from system 
memory into the register file via the second write port. 
This means that CPU calculations may continue 
uninterrupted while new data is being loaded into the 


17:12 
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register file. Also, as long as data is coming from sequen- 
tial locations in memory and going to sequential locations 
in the register file, the memory address counter and ‘C’ 
operand counter may be incremented together, thus 
loading several memory words in sequence. This loading 
may be accomplished without repeated address calcula- 
tion by the CPU. 


Operand Counter Use Example 


To help illustrate the use of the operand address count- 
ers a typical Finite Impulse Response (FIR) digital signal 
processing filter algorithm is described here. 


An FIR digital filter takes in a stream of amplitude 
samples from an analog waveform. Each sample is 
processed through a series of calculations to produce an 
output value. The resulting stream of output amplitude 
values produces a waveform that is the result of a filter 
operation on the input waveform. 


The calculations involved are a series of multiplies be- 
tween different coefficient values and several past input 


samples. The result of each multiply is accumulated to 
produce one output value. The number of coefficients 
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Figure 5-4. Macro Operand Address Counters 
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and retained past samples determines how selective 
the filter operation is. The values of the coefficients de- 
termine the type of filter operation; e.g., bandpass vs. 
lowpass. 


The algorithm for calculating one output value would be 
the following: 


Sum := 0; 
forn = 0 to number_of_coefficients do 
Sum := Sum + (Sample(x - n) * Coefficient(n)); 


Each time a new input sample is acquired, the new 
sample becomes Sample(x), and all past samples shift 
down in the sample array such that Sample(x - 1) := 
Sample(x) for all x. Note that the number of retained past 
samples is equal to the number of coefficients. 


This algorithm may be implemented with two arrays of 
data and a temporary register. One array contains coef- 
ficients and the other contains past input samples. 


The coefficient and sample operands may be multiplied 
ina single system cycle by either the Parallel Multiplier or 
the Floating Point Processor. The Parallel Multiplier may 
also perform an accumulate in the same cycle. The 
Floating Point Processor requires a second cycle to do 
the accumulate function. So for each multiply and accu- 
mulate operation on a sample-coefficient pair, either one 
or two cycles are needed. 


Obviously the operand counters may be used to address 
the data arrays. As each coefficient-sample pair is multi- 
ply-accumulated, the counters are incremented to point 
to the next pair of operands. This allows the inner 
multiply-accumulate loop to be only one or two microin- 
structions long. 


One feature of the operand counters adds to the effi- 
ciency of this algorithm. When an operand counter 
reaches either the maximum or minimum count value, 
the counter will reload the original count value from the 
macro opcode register on the next increment. This cre- 
ates a counter that may treat the register file as a circular 
buffer. The length of the buffer is determined by the 
distance from the original count value to either the base 
or upper limit of the register file address. 


Note also that if one counter is always incremented while 
the other is decremented, two circular buffers may share 
the register file. One has a lower bound of zero and the 
other an upper bound of 63. With this scheme two equal 
size buffers could be up to 32 words each. 


The circular buffer approach to the arrays works well with 
the FIR filter algorithm. At the end of each output value 


calculation, the counter addresses will point back to the 
first coefficient-sample pair, ready for the next input 
sample iteration. 


Note that if on the last multiply-accumulate cycle of an 
iteratation the sample operand counter is not incre- 
mented, and the ‘C’ operand counter is used to load a 
new sample from memory into the oldest sample array 
location, the effect will be to shift all the samples down by 
one in the array while overlapping the new came? load 
with the last cycle of a sample iteration. 


One additional cycle at the end of each iteration may 
move the output value from the register file to the mem- 
ory. No memory address calculation cycle is needed 
since the memory address counter may be used to 
address the memory. 


With this scheme only one cycle of overhead between 
iterations is needed. Therefore, assuming clocked multi- 
ply operation of the PM to achieve single cycle multiply- 
accumulate execution, a 31 coefficient FIR could com- 
plete one output value iteration in 32 cycles. Assuming a 
100 nscycletime(100 nsclocked multiply inthe PM), 
that would allow over 312,000 samples per second or an 
input bandwidth of over 156 kHz. A 9 coefficient filter 
would have a 500 kHz bandwidth. 


This isan example of how amicroprogrammed system 
may have its architecture tuned to a particular applica- 
tion for the best possible performance. Much of the 
performance comes from the microprogrammed 
system's ability to control and perform several parallel 
functions at one time. 


REGISTER FILE ADDRESS MULTIPLEXER 


The Register File Address Multiplexer, shown in the 
block diagram of Figure 1-2, is made up of four sepa- 
rate multiplexers. One multiplexer is used for each regis- 
ter file address port; two read ports and two write ports. 


Read Ports A and B 


These multiplexers are shown in Figures 5-4 and 5-5. 
Each multiplexer is really a three-state bus that may be 
driven either from the control pipeline register via an 
Am29827 three-state buffer or from an operand counter 


_ output. A bit for each address from the control pipeline 


selects which source may drive each address bus. 


The Am29827 three-state buffers are needed in addition 
to the three-state outputs of the control pipeline because 
each operand address is 6 bits. This number does not fit 
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Figrue 5-5. Register File Address MUX, Read Ports 
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Figure 5-6. Register File Address MUX, Write Port A 


well into the 4-bit boundaries of each slice of the microc- 
ode control store. So to avoid wasting control store bits, 
the external three-state buffer is used to gate the control 
pipeline address onto the register file address bus rather 
than trying to use the control store’s own three-state 
outputs. 


Write Port A 
This multiplexer is implemented by a pair of AMPAL18P8 


PALs. Itis shownin Figure 5-6. The logic definition file 
for the PAL is contained in Appendix H. 


It is this four input hex multiplexer that allows the ‘C’ 
register file operand (i.e., register file ‘A’ write port) 
address to come from four possible sources. The ad- 
dress may be provided fromthe ‘C’ operand in the control 
store, ‘C’ operand counter, ‘A’ operand final address, or 
‘B’ operand final address. The ‘A’ and ‘B’ operand ad- 
dresses are referred to as final because the multiplexer 
input is taken from the register address buses after the 
choice between control pipeline or operand counter has 
been made for the ‘A’ and ‘B’ operand addresses. The 
select bits for the multiplexer come from the control 
pipeline. 
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Figrue 5-7. Register File Address MUX, Write Port B 
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Figure 5-8. Position and Width MUX 


Write Port B 


This multiplexer is made from an AmPAL22V10. It 
operates as a two input hex multiplexer. It is shown in 
Figure 5-7. The logic definition file for the PAL is given 
in Appendix I. 


It selects either the control pipeline ‘C’ operand address 
or the ‘C’ operand counter address as the source for the 
register file ‘B’ write port address. The select bit comes 
from the control pipeline register. 


POSITION AND WIDTH MULTIPLEXERS 


The position and width multiplexers are implemented 
with AMPAL22V10A PALs. They are shown in Fig- 
ure 5-8. The logic definition file for the PALs is given in 
Appendix I. 


Each is a two input hex multiplexer, identical to the 
multiplexer used for the B Write Port Mux. They select 


from the Position and Width values that may be provided 
either from the control pipeline or the Macro Opcode 
Register. The select control comes from the control 
pipeline. 


‘A’ speed PALs are used here since these multiplexers 
are in the critical path to the ALU. They must use 7 ns 
less delay than the combined delay of the ‘A’ Read Port 
Mux and Register File access time. The required 7 ns 
advantage is consumed by the ALU’s longer propagation 
delay from Position input to Y output vs. Data input to Y 
output. 


SEQUENCER 


The sequencer is a 16-bit-wide address generator that 
controls the execution sequence of microinstructions 
stored in the microcode control store. It may handle 
interrupts or traps at any microinstruction boundary. 
An interrupt or trap is treated like an unexpected pro- 
cedure call. 
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Figure 5-9. Sequencer Block 
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Two independent branch inputs as well as four multi-way 


branch address sources are provided. One of the branch 


address inputs is bidirectional and may be used to read 
or write information in the sequencer’s interna! 33-level 
deep stack. | 


A 16-bit counter, test condition multiplexer, and break- 
point address comparitor are also provided. The break- 
point comparitor is used as a hardware aid to microcode 
debugging. The connections to the sequencer are shown 
in Figure 5-9. 


The sequencer’s ‘A’ branch address inputis connected to 
the Macro Opcode map RAM output and is the path 
through which the macroinstruction specifies its entry 
point into microcode. . 


The ‘D’ branch address input is tied to the D_BUS. 
Through this path, branch addresses or constants come 
from the control pipeline register and data may be ex- 
changed with the data section of the CPU. 


The ‘MO’ multi-way branch address input is connected to 


_ the macro opcode register bits 21:18. These bits may be 


used as a modifier to the macro opcode via a multi-way 
branch based on these bits. 


The ‘M1’ muiti-way branch address inputs come from the 
Floating Point Processor (FPP) external status register. 
These bits are the overflow, underflow, invalid, and 
‘extra’ status flags from the FPP. The ‘extra’ status flag is 
the OR of the zero, NAN, and inexact status flags from the 


FPP. A single multi-way branch on these inputs may be 


used to detect and handle quickly any of the catastrophic 
status conditions from the FPP. If the ‘extra’ flag is active, 
it indicates that a second multi-way branch may be used 
to determine which of the ‘extra’ status flags is active. 


The FPP zero, NAN, and inexact status flags are con- 
nected to the ‘M2’ multi-way branch input of the se- 
quencer. 
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Figure 5-10. D Bus Transceiver 








The ‘M3’ multi-way branch input is tied to the ALU 
microprogram status outputs so that an alternate means 
of checking ALU status is available. A multi-way branch 
based on these bits is able to check multiple condition 
flags in a single cycle. 


The Force Continue and Carry-Ininputs of the sequencer 
are active in a trap operation to prevent state change in 
the sequencer and capture the address of the trapped 
instruction in the interrupt return address register. Carry- 
in (CIN*) is driven high by atrap event signal from the trap 
logicinFigure 5-11. The trap event signal is also ORed 
with a signai from the control pipeline (P_FC) so that 
either signal will cause Force Continue to go high. The 
interrupt request input comes from the Trap circuit shown 
in Figure 5-11. 


The sequencers HOLD input is driven by the inverted 
value of the WCS_WR? signal from the host interface 
controller shown in Figure 4-3. When this signal is 
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active, the sequencer’s output will be three-stated so 
the WCS Port may drive the microcode control store 
address lines without contending with the sequencer’s 
output drivers. 


The Slave input is grounded since no use of the mode is 
made in this demonstration system. 


The test condition inputs of the sequencer come from 
three sources. Conditions 11 though 7 are the ALU status 
bits for zero, overflow, sign, carry, and link. Conditions 6 
through 2 come from the Macro Status Register; these 
bits are the macro version of the same ALU status bits. 
Condition 1 comes from the FPP external status register 
bit for zero. Condition 0 is unused. 


Control for the sequencer’s interrupt enable, test condi- 
tion select, and instruction input comes from the control 
pipeline register. 
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Figure 5-11. Interrupt and Trap Logic 
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The sequencer’s D BUS output enable comes from the | 


control decode logic. 


The sequencer A_FULL signal is used as an interrupt 
signal to the system interrupt controller. 


The Equal (breakpoint) signal is used as a trap event 
signal to the Trap Logic. 


Interrupt acknowledge goes to the interrupt controller 
andtrap logicto enable the interrupt and trap vectors onto 
_ the microcode control store address bus when an inter- 
rupt is executed. 


The ‘Y’ outputs of the sequencer drive the microcode 
control store address lines to select each microin- 
struction. 


D BUS TRANSCEIVER 


The transceiver between the A_BUS and the D_BUS is 
shown in Figure 5-10. 


The D_BUS has no parity bits included where as the 
A_BUS does contain parity. It is therefore necessary to 
provide parity generation for the data moved from the 
D_BUS to the A_BUS. 


The D_BUS is only 16 bits wide vs. the 32-bit-wide 
A_BUS. Thus it is also necessary to provide bus drivers 
and parity generators for the upper two bytes of the 
A_BUS, even though no variable data is passed to the 
A_BUS from the D_BUS through those bits. 


The transceiver and parity generator/checker function 
are combined in a single device type: the Am29853. Four 
of these are used in addition to an Am29862 inverting 
transceiver. The inverting transceiver is used on the 


parity bits because the Am29853 uses odd parity while - 


the Am29300 system uses even parity. 


As an added convenience for when numeric constants 
are passed from the D_BUS to the A_BUS, an AND gate 
is provided to drive the inputs of the upper two bytes of 
transceiver. If the AND gate is enabled by the control 
pipeline, the most significant bit of the D_BUS will be 
copied to all the upper bits on the A_BUS, thus perform- 
ing a sign extend for two’s complement numbers. If the 
AND gate is disabled, the upper bits of the A_BUS are 
forced to zero. 
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interrupt and Trap Philosophy 
What is a Trap? 


Traps are events that require the immediate attention of 
the CPU. The urgency of the event is so great that the 
CPU must not even complete the execution of the in- 
struction in progress in the cycle that the trap request 
happens. The CPU must not change any machine state 
in that cycle; it must store the address of the instruction 
that was to have been executed and must branch to a 
routine that services the trap event. 


The implication here is that the trap will prevent some 
disastrous change in machine state from which no recov- 
ery would be possible. Also implied is that the trap 
servicing routine may repair what ever the problemis and 
then return to complete the execution of the instruction 
where the trap occurred. 


One additional implication is that the trap event may be 
signaled early enough in the instruction cycle to prevent 
the clocking (change of machine state) that normally 
occurs at the end of each instruction. 


An example of a trap event could be a miss on cache 
memory access. To complete an instruction when the 
data being accessed from a cache is invalid would be a 
disaster with little chance for recovery. If a trap routine to 
update the cache may be executed instead of completing 
the instruction, the program may be saved. After the 
cache has the correct data, the trap routine may return to 
the aborted instruction to continue execution of the 
program as if no problem had existed. 


Another example of a trap would be a program break- 
point. When debugging a program it is very useful to be 
able to stop execution of a program just before executing 
a particular instruction. If this is done, the state of the 
machine before executing the breakpoint instruction may 
be examined. To do this the address of the breakpoint 
instruction is recognized as the instruction is fetched from 
microcode control store. In the next cycle before the 
instruction may complete, a trap occurs which branches 
to a debugging routine. Whenthe programmeris ready to 
continue the program, a return from trap completes the 
execution of the breakpoint instruction. The breakpoint 


- trap operation is easy to do, and hardware to implement 
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it is already provided in the Am29331 sequencer. The 
breakpoint trap operation will be shown in the Trap Logic 
described later. 


What is an Interrupt? 


Interrupts are events that require the attention of the CPU 
soon. 


“Soon’ is defined as faster than might happen if the event 
were polled by a CPU program but later than a few 
microinstruction execution cycles. 


Interrupt events and the resolution of an interrupt are not 
directly tied to the CPU state. No disasters occur if afew 
cycles pass by before the interrupt may be handled. 


Examples of events handled via interrupt could be: 
external mechanical events such as switches being 
opened or closed, an impending stack-full situation, a 
message signal from another processor, or a peripheral 
delay timer indicating time-out. 


In this demonstration system one other class of interrupt 
source is included. It is the parity error. A parity error 
implies corrupted data in a program that cannot be 
corrected. Since the influence of corrupted data on the 
program is difficult to determine or correct for, the af- 
fected program should be aborted. A parity error is, 
therefore, important to detect so that the program in 
which it occurs may be terminated and perhaps rerun 
with corrected data. 


Parity errors are treated as interrupts rather than traps for 
two reasons. The indication that an error has occurred 
comes fairly late in an instruction cycle and is therefore 
difficult to use as a trigger for a trap. When a parity error 
occurs, the program is generally corrupted and will be 
‘terminated; whether the termination happens in the cycle 
following the error as would be the case with a trap, or 
within a few cycles, as with an interrupt, is unimportant. 


Interrupt Operations 


There is no need to design an interrupt circuit from 
scratch when one already exists. The Am29114 interrupt 
controller is used in this system. It provides interrupt 
latching, priority, masking, and vector generation for 
eight interrupt inputs. 


In terrupt Controller 
Six interrupt sources are used in this Am29300 system; 


the two remaining interrupt source inputs are available 
for software generated interrupts. 


The interrupt and trap circuit block diagram is shown in 
Figure 5-11. 


The three highest priority interrupts are parity error sig- 
nals from the D_BUS, the Am29C323 Parallel Multiplier, 
and the Am29332 ALU. 


The next priority interrupt is a signal from the FPP 
external status PAL, which indicates that one of the 
following status flags is active: Overflow, Underflow, or 
Invalid. 


The next priority interrupt is the A_FULL signal from the 
Am29331 sequencer. This interrupt indicates that the 
sequencer stack will be full if three additional stack 
pushes occur. 


The next interrupt is the external bus interrupt signal from 
the host interface controller. This is a “tap on the shoul- 
der” from the host that requests the Am29300 CPU take 
some previously agreed on action, such as reading a 
message from the host out of memory. 


The two least significant interrupts are unused by hard- 
ware and are available for use as software interrupts. 
These interrupts would be set by the CPU writing into the 
Am29114 interrupt register. 


The interrupt mode is set for capturing asynchronous low 
going pulses as interrupt signals. This is done because 
most of the interrupt signals are only guaranteed to be 
active for a single clock cycle. Therefore, the interrupts 
must be latched and held by the interrupt controller until 
acknowledged by the CPU. 


The D_BUS is connected to the interrupt controller data 
pins so that the internal interrupt, mask, and in-service 
registers may be read and written. 


The interrupt controller is selected and given instructions 
via outputs of the control pipeline register. 


Interrupt Sequence 


During a given clock, one of the interrupt inputs goes 
active. At the end of that cycle (active edge of clock), the 
interrupt signal is clocked into the interrupt register of the 
Am29114. 


During the second clock cycle, the interrupt is ANDed 
with the interrupt mask register and, if the interrupt is 
allowed, its priority is Compared to any currently in- 
service interrupt. lf the new interrupt is of higher priority 
than any in-service interrupt, the MINTR* (interrupt re- 
quest) will go active at the next active clock edge. 
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During the third clock cycle, the Am29114 interrupt 
request is externally ORed with the interrupt request from 
the trap logic. The combined interrupt request is then 
loaded into a delay flip flop. The delay flip flop is needed 
to synchronize the final interrupt request with the system 
clock. The reason for this is that the interrupt request from 
the Am29114 is stable too late (41 ns) in the third cycle 
to be useful in selecting an interrupt address. The set-up 
time for the microcode control store address could not be 
met if the Am29114 interrupt request were used directly 
with the Am29331 sequencer. 


The external OR and delay functions are _ imple- 
mented in an AmPAL22V10A, whose logic is shown in 
Figure 5-12. 


During the fourth clock cycle, the INTR* (interrupt re- 
quest) input of the sequencer is driven by the delay flip 
flop. The sequencer then returns INTA* (interrupt ac- 
knowledge) if micro-interrupts are allowed. The INTA* 
signal enables the interrupt vector onto the microcode 
control store address lines. 


The LSB three bits of the interrupt vector are provided by 
the Am29114 interrupt priority encoder. Bit 3 of the 
interrupt vector is provided by the trap logic. The bit is low 
for an interrupt and high for a trap vector. The upper bits 
(4:11) of the vector are provided by an external 
Am29818-1 register. This register provides a variable 
base address for a nine entry point table look-up (multi- 
way branch), which is based on the four bits of interrupt 
vector from the Am29114. The Am29818-—1 register is 
loaded via the D_BUS or through the diagnostics SSR 
chain. The need for a nine entry point table is explained 
in the section on trap operation. 


During the fifth clock cycle of the interrupt sequence, the 
first instruction of the interrupt routine will execute. Dur- 
ing this cycle the interrupt return address will be pushed 
onto the sequencer stack. 


Insummary, from the time an interrupt signal becomes 
active until the interrupt service routine begins execu- 
tion, four instructions in the main program will complete 
execution. | 
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Figure 5-12. U75 AmPAL 22V10A Trap Logic PAL 
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Trap Operation 
Trap Issues 


Atrap requires extremely fast response to the trap event 
signal. | 


The ideal situation is forthe trap event signal to cause the 
abortion of the instruction in execution at the time the 
event signal appears. 


This is extremely difficult in a high clock frequency 
system. To succeed, the trap event signal must be stable 
at least in time to prevent clocking of the data section of 
the CPU, which would otherwise change the system 
state (i.e., complete execution of the instruction). This 
implies that the trap event signal is stable one clock 


control circuit set-up time before the high to low edge of | 


the system clock. The high-to-low edge of clock is signifi- 
cant, because once the clock signal falls, the writing of 
any write enabled port on the Am29334 register file will 
begin. In addition, the trap event signal must be stable in 
time to cause the Am29331 sequencer force continue 
(FC), interrupt request (INTR), andcarry in(CIN*) signals 
to go high soon enough to disable the sequencer micro- 
program address in time to meet the set-up time require- 
ments of the microcode control store. 


In a 100 ns cycle time system, such as the one being 
discussed here, the trap event signal must be valid no 
later than 25 ns into the cycle. Fora trap event signal 
that is to be derived from the effects of the instruction in 
execution in that cycle, this requirement is very difficult 
to meet. 


Fortunately there are trap events that may be signalled 
onthe one or two cycles previous to the cycle in which the 
trap must occur. Some examples would be: acache miss 
that may be detected from the cache address created in 
a cycle prior to that in which the cache data is used ina 
calculation; or a breakpoint in which the breakpoint target 
instruction address is detected by the sequencer in the 
cycle prior to the instruction being loaded into the control 
pipeline for execution. 


lf a an instruction is a known potential trap, it is possible 
to execute the instruction so that no critical information is 
destroyed by completing its execution. This may be done 
by writing results back to a temporary register while 
allowing no other significant system state changes, such 
as updating the ALU Q register, or doing a return from 
procedure call. The instruction may then be allowed to 
execute and generate any trap event signals that might 
result from the execution, without concern for irrevocably 
destroying data because of some error condition. 


In the above examples, the trap event signal may be 
loaded into a delay flip flop to synchronize the trap 
request with the beginning of the following cycle. This 
causes the trap operation to occur early in the cycle 
following the event and to complete successfully. 


The only trap condition implemented in this design is the 
breakpoint. 


Trap Logic 


By definition, the response time between trap event 
signal and trap operation must be much faster than the 
four or more cycles that an interrupt takes to begin 
execution. This requires that the trap logic be different 
from the Am29114 interrupt controller. The trap logic 
design is implemented in an AMPAL22V10A. The logicis 
shown in Figure 5-12. The definition file for the PAL is 
shown in Appendix J. 


The trap logic is in effect a simpler and faster interrupt 
controller. This “trap controller’ is cascaded with the 
Am29114 interrupt controller so that the same address 
vector approach used with the interrupt controller may be 
extended to trap operations. 


Atrap is treated as a special form of interrupt with a higher 
priority. When a trap occurs, the trap logic generates a 
cascade out (CASOUT2) signal to the Am29114 to 
prevent any interrupt operation from beginning in the 
same cycle. 


The trap logic also generates an INTR signal to the 
Am29331 sequencer. The INTR signal in turn causes the 
sequencer to three-state its microcode address outputs | 
and return an INTA signal to the trap logic. The INTA 
signal enables a four bit vector fromthe trap logic and the 
interrupt base address from the Am29818-1 registers as 
shownin Figure 5-11. | 


The above steps essentially generate an interrupt and 
provide the interrupt vector. What makes atrap different 
is that the Trap Logic is also used to drive the Am29331 
sequencer Force Continue and Carry-In inputs. This 
causes the sequencer to ignore the instruction being 
trapped and to perform a continue instruction instead, 
which changes no state in the sequencer. The CIN* 
signal’s being high causes the trapped instruction ad- 
dress to not be incremented. Therefore, the trapped 
instruction’s address will be loaded into the sequencer 
interrupt return address register. In addition, the TRAP 
signal is used to prevent any state change in the system 
other than in the sequencer, effectively aborting the 
trapped instruction. 
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Following are some other features to note in the trap 
logic. 


Am29300 system RESET is used to generate the se- 
quencer Carry-In signal (SEQ_CIN*). This is done to 
force SEQ_CIN* high during reset so that the first microc- 
ode instruction executed after reset will be at address 
zero rather than one. 


In order for a trap operation to take effect, the instruction 
- that is to be trapped must have its microcode interrupt 
enable bit active. This bit is used as the interrupt enable 
to the sequencer. If it is not active, then the microcode 
contro! store address from the sequencer will not be 
three-stated, and the interrupt vector will not be substi- 
tuted. In addition, the TRAP signal will still occur, causing 
the trap target instruction not to execute correctly. Note 
that the interrupt enable bit could be externally forced 
active by the trap operation via an OR gate. But the added 
delay could cause the interrupt acknowledge to be too 
late to allow the interrupt vector address to meet required 
set-up times. (Of course, it is possible to design the 
system so that every trap causes allthe system clocks to 
be stopped for one cycle. That would allow enough time 
for all kinds of tricks to be played. This design, however, 
will not explore that approach.) 


MICROCODE CONTROL STORE AND 
CONTROL PIPELINE REGISTER | 


Control Store Function 


The microcode control store is the high speed memory 
that contains the control bits comprising the instructions 
that the system may execute. 


This system uses what is called “horizontal” microcode. 
Each microinstruction contains many control bits that 
manage a variety of different functions in parallel. “In 
parallel” is the key phrase. All the control information 
needed to manage the entire Am29300 system during 
the execution of one microinstruction is contained in one 
word of microcode control store. 


The memory must be fast because its access time must 
be significantly shorter than the cycle time of the system. 
In general the access time must be less than half the 
cycle length. This is because of the time required by the 
sequencer to generate each new address to the control 
store, which takes up the remaining time in the cycle. 


Pipeline Register Function 


At the output of the microcode control store there is a 
register to hold the control information stable during the 


execution of an instruction. With the control information 
held in the pipeline register, the control section of the 
CPU is free to begin reading the next microinstruction 
from the control store. In this way, the control section is 
operating in parallel with the data section. The control 
section fetches the next instruction while the data 
section executes the current instruction. This parallel 


operation, where one section of the system works onone 


step of aproblem while another section works on the 
next step, is called pipelining, hence the name for the 
pipeline register. 


Through parallel operation, pipelining nearly doubles the 
speed of the system over what might be the case if the 
control section and data section were directly tied to- 
gether in a serial fashion. 


Control Store Implementation 


Because this method of pipelining the output of a mi- 
crocode store is so popular, there are special memories 
available that combine a high speed memory with a 
pipeline register at its output. These combined memory 
and pipeline devices may significantly reduce the 
system parts count. 


These memories are available as either RAM or 
PROM devices. RAM _ versions are used to make 
writable control stores. 


These memories also include Serial Shadow Registers 
(SSR) along with the pipeline register. This allows diag- 
nostic routines to read and control the pipeline register 
outputs. Where RAM versions are used, the SSR is used 
as a built in means to load the writable control store. 


This system is designed to use one of the following for 
control store: Am9151-50, 1K x 4 RAM; Am27S65, 
1K x 4 PROM; Am27S75, 2K x 4 PROM; or 
Am27S85, 4K x 4PROM. These devices all share a 
similar pinout so that simple jumper connections allow 
any of them to be placed in the same sockets. 


The connections to the control store are shown in Figures 
5-13 and 5-14. 


A total of 23 memories are used to form the needed 92- 
bit-wide microcode words. 


‘Because this system is designed to use no more thana 


4K word deep control store, only the lower 12 bits of 
microcode address from the sequencer are connected. 


The memories in the control store which provide the 
microcode branch field are connected differently fromthe 
remaining memories. This is because the branch field 
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Figure 5-14. Microcode Control Store 
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outputs are connected to the D_BUS and must be three- 
stated when other devices drive the D_BUS. Allthe other 
outputs of the control store are always output enabled. 


Figure 5-13 shows how the bulk of the control store is 
connected. 


When the Am9151-50 or the Am27S65 is used, the 
jumper at location “B” is connected. This continuously 
enables the memory. 


When the Am27S75 is used, the jumpers at locations A 
and D are connected. Also, the Am27S75 G/Gs* (pin 20) 
is internally programmed as an asynchronous enable. 
Those jumper connections will always enable the mem- 
ory and connect address bit 10 to it. 


When the Am27S85 is used, the jumpers at locations A 
and C are connected. The Am27S85 G/Gs/I/Is* (pin 19) 
is programmed as a synchronous initialize function. 
Those connections will always enable the memory and 
provide address bits 10 and 11 to it. 


Figure 5-14 shows the connection for the memories 
that support the branch field. 


When the Am9151-50 or the Am27S65 is used, the 
jumpers at location B and E are connected. This enables 
the memory when the control pipeline selects the control 
store to drive the D_BUS. 


When the Am27S75 is used, the jumpers at locations A, 
D and E are connected. Also, the Am27S75 G/Gs* (pin 
20) is internally programmed as an asynchronous en- 
able. Those jumper connections will enable the memory 
when the control pipeline selects the control store to drive 
the D_BUS. 


When the Am27S85 is used, the jumpers at locations A, 
C, and F are connected. The Am27S85 G/Gs/I/Is* (pin 
19) is programmed as an asynchronous enable function. 
Those connections will enable the memory when the 
control pipeline selects the control store to drive the 
D_BUS. Also, these connections imply that when the 
Am27S85 is used, the branch field of the initialize word 
will not be valid. | 


CLOCK CONTROL 


In almost every complex digital system there is a need to 
control and qualify selectively the system clock. 


Aregister often needs a qualified clock that will clock (i.e., 
load) the register only when specified by some control 
signal. Sometimes a register will internally qualify its own 


clock by providing a load enable input. But most often, 
registers have only data input and outputs, an output 


enable, and an unqualified clock input. It is up to the 


system designer to provide a means to restrict the clock 
to the register so that it receives clock only on those 


_ cycles when its load enable control signal is active. 


Restricting a clock in this fashion is referred to as quali- 
fying a clock. The controlling signal that enables the 
qualified clock is called the qualifier. 


Most synchronous digital systems have a system clock 
with a single active edge. This means that the system 
state will only change on eitherthe low-to-high or high-to- 
low edge of the clock. The opposite transition of the clock 
will have no state changing effect in the system. The 
opposite transition of the clock is referred to as the 
inactive edge of the clock. It should be noted, however, 
that, even though there is a single active edge for the 
Clocking of registered states inthe system, the level ofthe 
clock may have an effect on some multiplexers or latches 
inthe system. The level of the clock may control the path 
selected by a multiplexer, whether a latch is flow-through 
or held, or the write enable of a memory. 


To qualify a clock, there must be a way to prevent the 
active edge from occurring. This implies that the clock is 
held either high or low when it is prevented from cycling. 
The choice of whether the clock will be stopped (held) at 
its high level or low level may depend on what, if any, 
effect the level of the clock has on system multiplexers, 
latches, or memories. For example, if the low level of the 
clock enables a memory write line, it may be preferred to 
stop the clock at the high level rather than the low level to 
prevent any change in state of the memory. 


Clock Qualification Circuit 


Inthe Am29300 system described here, the system clock 
will be stopped at the high level. This is because the low 
level of the clock may start the writing of data into the 
Am29334 register file. The active edge of the clock willbe 
the low-to-high transition. 


This method of qualifying clocks is referred to as ‘OR’ 
qualification. Usually with this method the free-running 
(unqualified) version of the system clock is ‘ORed’ witha 
low active enable signal. Thus, ifthe enable is active (low) 
the resulting qualified clock is allowed to track the free 
running clock. If the enable is inactive (high) the qualified 
Clock will be forced high, stopping the clock, until the 
enable again goes active. Because the free running clock 
is always high during the first portion of each clock cycle, 
the clock enable signal need not be stable until just before 
the inactive edge of the free running clock. 
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In this Am29300 demonstration system the following are 
the desired controls over the system clocks: 


1. The ability to stop all clocks to the Am29300 CPU, 
both control and data sections. This will suspend 
operation of (halt) the system. 


2. The ability further to qualify register loading 
(register clocks) with control pipeline signals. 
The controlled registers would be the Macro 
Status, Macro Opcode, and Interrupt Base 
Address register. 


3. The ability to single step all the system clocks 
when the systemclocks are inthe halt mode. Note 
this implies only conditional single stepping on 
those register clocks that are further qualified by 
load enable controls. 


4. The ability to single step the data section or the 
control section independently. 


5. The ability to force the control pipeline or the 
Macro Status, Macro Opcode, and Interrupt 
Base Address registers to load. This capability 
is used to implement diagnostic control over 
these registers. 
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To implement this kind of control over the system clocks, 
a separately qualified version of the system free running 
clock must be created for each differently handled regis- : 
ter. The general clock for the control section is different : 
from that for the data section. Also, each qualified regis- : 
ter clock is different. : 


The block diagram for the clock qualification circuit is 
shownin Figure 5-15. The logic equation definition file 
for the PAL in this circuit is shown in Appendix K. 





The qualifiers for the system clocks come from either the : 
control pipeline, trap logic or the host interface controller. 
The AmPAL22V10A Programmable Array Logic (PAL) 
device is used to combine the various qualifiers into the 
appropriate clock enables for each differently handled 
set of registers. The output of the PAL is then logically 
ORed with the system free running clock to form the 
various qualified clocks in the system. 


Inthis system, the free running clock generator produces 
an active low clock with the enables active high. By using 
negative logic OR gates (NAND gates) the clock and 
enable signals are logically ORed together to produce 
active high qualified clocks. The negative logic OR gates 
are external to the clock qualifier PALs. 
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Figure 5-15. Clock Qualification Block Diagram 
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~The NAND gates also serve as high output current 
buffers that allow the qualified clocks to drive many 
registers inthe system. These NAND buffers also cause 
the clocks to have very high speed edges. This requires 
that clock lines be handled more carefully than other 
signal lines to help prevent noise, reflections, and ringing 
on the clock lines. Preventing these problems helps to 
ensure clean clock signals free from the glitches that may 
cause missed clocking or double clocking of registers. It 
is suggested that clock lines be routed serially, kept less 
than 12 inches in length, and terminated to the printed 


circuit board’s characteristic impedance at the last point - 


of use on each clock line. 


Note that allthe system clock lines, even the free-running 
clock line, pass through a NAND gate. This is done to 
equalize the delay of all clocks so that clock skew in the 
system is minimized. 


Clock Generator 


The unqualified (free running) source for all the clocks in 
the system comes froma clock generator implemented in 
an AmPAL16R6B. Adiagram of the logic implemented in 
this PAL is shown in Figure 5-16. The logic equation 
definition file for this PAL is shown in Appendix L. 






P-CLKLEN (1) 
P-CLK-LEN (0) 


30 MHz CLOCK 


Figure 5-16. U100 AmPAL16R6B Clock Generator 


The only reason that a clock generator PAL is used in 
addition to a simple clock oscillator module is to provide 
the ability to vary dynamically the length of each system 
Clock cycle. This ability allows the system to run at the 
maximum clock rate most of the time when the fastest 
data paths are in use and to run ata slowerrate only when 
slower system data paths are in use. By slowing the 
system cycle time dynamically only when a slow data 
path is used, the average system speed is much higher 
than would be the case if the system clock rate were fixed 
at the rate required by the slowest data path. 


A simple way to do this would be to divide the normal 
system clock by two and on each cycle select whether 
the normal length or the double length clock cycle would 
be used. 


In this system, finer control over the length of each cycle 
is desired. Where the cycle need only be a little longer 
than usual, only a slightly longer cycle is used rather than 
doubling the cycle length. 


This is done by dividing down a high speed clock, which 
runs three times faster than the normal system clock. Itis 
then possible to extend a clock cycle in increments of the 
high speed clock. A cycle then may be 1, 1 1/3, 1 2/3, or 
2 times the normal cycle length. | 


O9856A 5-16 
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The Am29300 demonstration system’s normal clock is 
~ 10 MHz, or100 ns, long. The high speed clock is then 
30 MHz and is provided by a commercially available 
clock oscillator module. 


The control over the cycle length comes from the control 
pipeline register and may thus be specified differently on 
each instruction. Two bits are provided to select one of 
the four cycle lengths. Each instruction may thus control 
its own cycle length based on the time required by the 
data paths that are used. 


The waveform of the clock may be described in terms of 
the number of high speed clock periods during which it is 
active and then inactive. 


Note that the output of the AMPAL16Ré6 is inverting. The 
logic internal to the PAL creates an “active high” clock 
with a low-to-high active edge. This waveform is inverted 
by the final output of the PAL and is later inverted once 
more in the clock qualifying circuit. The final system 
clocks are thus active high. Whendescribing any system 
clock, it will be done in terms of an active high clock. The 
clock generator waveform is shown in Figure 5-17, 
where the outputs are shown active high, even though 
the actual PAL output is inverted. 


Each clock cycle has two or more active periods followed 
by one inactive period. 













The clock generator PAL output is froma D flip flop. When 
the flip flop output is inactive (low), one term feeds back 
the inverted output. This will force the flip flop high onthe 
next high speed clock. The output of this flip flop feeds a 
shift chain of four other flip flops, which act as a simple 
timer for the extended cycle lengths. 


During the first active period of the clock output, the 
output of the first flip flop in the timing chain is still inactive. 
This first flip flop’s output is inverted and fed back into the 
Clock output flip flop to force the clock output to remain 
high for a second active period. 


During the second active period, the clock cycle length 
bits from the control pipeline become stable and deter- 
mine whether additional active periods will be inserted 
into the output clock. 


Note that since the first two periods of active clock are 
forced by the logic, the control bits need not be stable for 
two high speed clock periods minus the PAL set-up time 
(66.6 ns - 15 ns = 51.6 ns). This time margin is further 
reduced by the skew between the high speed clock and 
the qualified clock to the control pipeline which is equal to 
the clock-to-output time of the clock generator PAL plus 
the propagation delay of the qualifying NAND gate 
(51.6ns-(10 ns+5.5ns)=36.1 ns). Therefore, as long 
as the control pipeline register clock-to-output time does 
not exceed 36 ns, the clock generator will work as 
described here. 
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Figure 5-17. Clock Generator Outputs (Inverted) 
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If the clock cycle length bits are zero, no additional 
feedback terms are enabled and the clock output flip flop 
will go low in the next high speed clock period. 


If the clock cycle length bits equal 1, the output of the 
second timing chain flip flop is fed back to the output flip 
flop to allow one additional active clock period. 


Similarly, when the clock cycle length bits are equal to 2 
or3, an additional 2 or3 active periods are inserted inthe 
output clock waveform. 


When the clock output flip flop again goes inactive, its 
output will force all of the timing chain flip flops to be 
cleared, thus beginning a new Am29300 clock cycle. 


MICROCODE WORD 


This section describes the structure and function of each 
field of bits in this system’s microcode word. Included are 
some comments on how functions were determined and 
how they might vary in similar systems. 


Control Philosophy 


Ina microprogrammed system, each word of the microc- 
ode functions as the determinate of all system action 
during one clock cycle of system operation. Each bit 
directly affects some aspect of the machine. Each field of 
bits may act independent of other fields to manage 
parallel data paths and simultaneous operations. This 
ability to manage parallel activities in each machine cycle 
gives a microprogrammed system high speed and flexi- 
bility. But the power of complete parallel control over 
nearly all the functions in a system comes at a cost. 


The cost is wide control memory words. Fifty- to 150-bit- 
wide control words are common in microprogrammed 
systems. Three hundred-bit-wide control words have 
been used in large mainframe computers for years. 


With each machine instruction’s eating up 100 or more 
bits of memory, it doesn’t take long to consume signifi- 
cant board space, power, and cost for high speed microc- 
ode memory. 


The resulting dilemma between the need for parallel 
control and the cost, size, and power that accompanies 
it, is the basis of many a system designer’s headache. 


The usual approach used to strike a balance between the 
opposing issues is to determine carefully which functions 
must absolutely be able to occur in parallel, then to limit 


the microcode word size to that absolute minimum. 
Control over other less frequently used functions or over . 
alternate operations is then overlapped with the primary 
control fields. | 


Overlapping of control fields means that during certain 
operations, the meaning of the bits in the overlapped 
control field changes. The hardware controlled by the 
primary meaning of an overlapped field must be dis- 
abled during the time that the alternative meaning is in 
effect. This of course means that the functions con- 
trolled by the overlapped fields cannot occur in the 
same machine cycle. 


This results in winning a little and losing a little. More 
control and thus more functions may be managed with 
less control memory, but some operations then take 
multiple cycles to complete, due to the use of functions 
that may not be managed in one instruction. Also, the 
need to enable and disable control field meanings and 
the associated hardware, will add control bits and decod- 
ing logic. The decode logic adds delay into the machine 
cycles and will cause the system to run a little slower. 


Additional savings in control word size may be made by 
encoding fields rather than having each bit directly drive 
a control signal. This again adds decoding logic and its 
associated delay. 


The job of deciding what control must be parallel and 
what must be overlapped is more art than science. No 
matter how the microcode word is defined, there will 
always be other interesting ways to rearrange and over- 
lap the control fields. Each way will cost something either 
inword width or control decoding, thus providing endless 
trade-offs. 


All these possible variations make it extremely important 
to have a thorough understanding of the algorithms to be 
handled by a particular machine. The better the under- 
standing, the better the chance to optimize the system 
architecture and control to solve the problem at hand. 


Microcode Word Field Descriptions 


Throughout the figures that detail the design of this 
system, signals that travel from page to page have been 
given meaningful names that imply the function of the 
signal. This helps in understanding what is going on in 
each figure. Many of these signals are the direct outputs 
of the control store pipeline register. As it turns out, many 
of the bits in the microcode carry multiple meanings 
because the function of several fields are overlapped to 
Save microcode word size. 
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The result is that more that one signal name may often be 
associated with a particular bit of the control pipeline. 
Physically, of course, all signal lines that ultimately con- 
nect to a particular pipeline bit are one piece of wire. The 
logical separation of lines, by using different names, only 
helps to understand the function of a given signal, when 
the hardware that uses the signal is enabled. The follow- 
ing three Figures show the physical and logical relation- 
ships between the microcode control store bits and the 
signal names (meanings) that are attached. 


Each Figure is split into pairs of columns preceded by 
one column that indicates the individual bit numbers for 
each signal. Each column pair contains a Field Name 
column that describes the function of the bit and a Signal 
Name column that gives the signal name used through- 
out the Figures in this document forthat meaning. The left 
most column pair shows the primary meaning of the 
control bits. Other column pairs to the right give alternate 
(overlapped) meanings for the control bits along with the 
signal name used with each meaning. 


Unless a control bit is overlapped with an alternate 
meaning in one of the columns to the right, the function 
of the control bit is constant. 


Register File Controls 


Figure 5-18 shows the microcode word bits that affect 
the Am29334 register file. 


It was decided that a three address machine would be the 
most appropriate way to obtain the best performance 
from the Am29300 family components. Because of the 
common three bus architecture these parts share, a 
three address register file fits nicely. Two addresses are 
used to read an A and B operand from the file while the 
third address specifies an independent write location. 
This allows writing back results without requiring the 
destruction of one of the read operands in a single cycle. 


An address multiplexer on the C operand register ad- 
dress does allow for two and one address operations by 
allowing either the A or B operand address to be used for 
the write operand address in addition to its use as aread 
operand. 


Also, to support macroinstruction execution, address 
multiplexers are used on the read addresses so that 
macroprogram supplied register addresses may be di- 
rected to the register file. When macroprogram supplied 
addresses are in use, the meaning of the register ad- 
dress fields changes to control signals for the macro 
operand address counters. With this alternate meaning, 
the macro addresses may be incremented or decre- 
mented at the end of each cycle. 


Bits 91 and 84 select whether the microcode or the macro 
opcode addresses are directed to the register file. If 
either bit is high, the alternate definition for the related 
address field takes effect, andthe macro opcode address 
is used. 


Bits 76 and 77 are used to select one of four addresses 
to be supplied to the A write port of the register file. The 
selections are as follows: 


Bit 
77 76 

O 0 Coperand microcode address used. 

O 1 A operand address, as specified by bit 91. 
1 0  B operand address, as specified by bit 84. 
1 1 © macro operand counter address used. 


When any selection other than forthe C operand microc- 
ode address is made, the field assumes the alternate 
meaning for control of the macro operand counter. 


In addition to the three addresses used by the data 
section of the CPU, a fourth address is provided for the 
B write port of the register file so that data may be moved 
into the file via the second port while other calculations go 
on undisturbed. 


The address for this fourth port comes from a multiplexer 
that may select either the C operand microcode address 
orthe C macro opcode address counter as the source. Bit 
69 is the select input for this fourth address multiplexer. 


Bit 68 enables the register file A read port onto the 
A_BUS. If this bit is inactive and if the FPP seed register 
output is also inactive, the D_BUS to A_BUS transceiver 
is enabled so that constants, masks, and variables may 
be passed from the D_BUS to A_BUS. 


Bits 67 and 66 are used as the write enable controls for 
the two write ports of the register file. 


Data Path Controls 
The data path controls are shown in Figure 5-19. 


To provide a straightforward example of the usage of the 
PM and FPP, these devices have had their input and 
output buses paralleled with those of the ALU. In this 
arrangement it is not generally feasible to make use of 
more than one module in a given cycle. This is because 
the data buses may carry useful information to only one 
device at a time (this assumes that passing the same 
data to more than one device is of limited use). Also, only 
one device may drive the Y_BUS at a time. 
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Figure 5-18. Am29300 Demonstration System Microinstruction Word Layout -- Register File Controls 








Control Primary Primary Alternate 1 Alternate 1 Alternate Alternate 2 
Pipeline Field Name Signal Name Field Name Signal Name Field Name Signal Name 
Bit # Meaning Meaning Meaning . 
P91 Reg A Macro/Micro* P_ARA_MAC 

If P91 = 0 then primary ‘If P91 = 1 then alternate 1 
P90 Register A Address (5) PRA = (5) 
P89 Register A Address (4) P_RA (4) 
P88 Register A Address (3) P_RA (3) 
P87 Register A Address (2) P_RA (2) 
P86 Register A Address (1) P_RA (1) RA Count Direction P_UP/DN_A 
P85 Register A Address (0) P_RA (0) RACount Enable P_CNTA_EN 
P84 Reg B Macro/Micro* P_ARB_MAC 

lf P84 = 0 then primary lf P84 = 1 then alternate 1 
P83 Register B Address (5) P_RB (5) 
P82 Register B Address (4) P_RB (4) 
P81 Register B Address (3) P_RB (3) 
P80 Register B Address (2) P_RB (2) 
P79 Register B Address (1) P_RB (1) RB Count Direction P_UP/DN_B 
P78 Register B Address (0) P_RB (0) RB Count Enable P_CNTB_EN 
P77 Reg C Add Source (1) P_C_SEL (1) 
P76 Reg C Add Source (0) P_C_SEL (0) 

lf P77:76 = 00 then primary If P77:76 = 01, 10, 11 then alternate 1 
P75 Register C Address (5) P_RC (5) 
P74 Register C Address (4) P_RC (4) 
P73 Register C Address (3).P_RC- (3) 
P72 Register C Address (2) P_RC (2) 
P71 Register C Address (1) P_RC (1) RC Count Direction P_UP/DN_C 
P70 Register C Address (0) P_RC (0) RC Count Enable P_CNTC_EN 
P69 B Write Port Select P_AWB_MAC 
P68 A Bus Output Enable* P_OEA* 
P67 A Port Write Enable’ P_WEA* 
PEG B Port Write Enable* | P_WEB* 
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Figure 5-19. Am29300 Demonstration System Microinsturction Word Layout -- Data Path Controls 














Control — Primary Primary Alternate 1 Altemate 1 Alternate 2 Alternate 2 I. 
Pipeline Field Name Signal Name Field Name Signal Name Field Name Signal Name # 
Bit # Meaning Meaning Meaning : 
P65 Data Path Select (1) P_DPS (1) 

P64 Data Path Select (0) P_DPS (0) 

ALU when P65:64 = 00 FPP when P65:64 = 10,11 PM when P65:64 = 01 

P63 ALU Instruction (8) P_LALULINST (8) FPU Instruction (4) P_FPI (4) TCX P_TCX 

P62 ALU Instruction (7) PLALU_LINST (7) FPU Instruction (3) PPI (3) TCY P_TCY 

P61 ALU Instruction (6) P_ALULINST (6) FPU Instruction (2) P_FP_I (2) ACC (1) P_ACC (1) 

P60 ALU Instruction (5) P_ALULINST (5) FPU Instruction (1) PLFPI (1) ACC (0) P_ACC (0) i 
P59 ALU Instruction (4) P_LALULINST (4) FPU Instruction (0) P_FP_I (0) RND P_RND ; 
P58 ALU Instruction (3) P_LALULINST (3) ENR* P_ENR* XSEL P_XSEL , 
P57 ALU Instruction (2) P_LALULINST (2) ENS* P_ENS* YSEL P_YSEL i 
P56 ALU Instruction (1) PLALULINST (1) ENF* P_ENF* TSEL P_TSEL 7 
P55 ALU Instruction (0) P_ALULINST (0) Feed Through (1) P_FP_LFT (1) ENXA* P_ENXA* | 
P54 Position Mac/Mic* P_POS_MAC Feed Through (0) P_FP_FT (0) ENXB* P_ENXB* : 
P53 Position (5) P_POSITION (5) lEEE/DEC* P_IEEE/DEC* ENYA* P_ENYA* 

P52 Position (4) P_POSITION (4) Seed Output Enable* P_SEED_OE  ENYB* P_ENYB”* 

P51 Position (3) P_POSITION (3) Projective/Affine P_PROJ/AFF* ENP* P_ENP* 

PSO Position (2) P_POSITION (2) Rounding Mode (1) P_LFPLRND(1) ENT’ P_ENT* 

P49 Position (1) P_POSITION (1) Rounding Mode (0) PLFP_LRND(0) FA P_FA 

P48 Position (0) P_POSITION (0) FTX P_FTX 

P47 Width Mac/Mic* P_WID_MAC FTY P_FTY 

P46 - Width (4) P_Width (4) FTP P FTP 

P45 Width (3) P_Width (3) PSEL (1) P_PSEL (1) 

P44 Width (2) P_Width (2) PSEL (0) P_PSEL (0) 

P43 Width (1) P_Width (1) : 

P42 Width (0) P_Width (0) 

P41 Macro/Micro* Status P_MIC/MAC 

P40 Register Status P_REG_STAT 

P39 Load Macro Status P_LD MAC_STAT 

P38 Borrow Mode P_BM 

P37 Memory Add Select (3) P_MEM (3) | 
P36 Memory Add Select (2) P_MEM (2) | 
P35 Memory Add Select (1) P_MEM (1) 

P34 Memory Add Select (0) P_MEM (Q) 

P33 Memory Write En* P_MEM_WR* 
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Figure 5-20. Am29300 Demonstration System Microinstruction Word Layout -- Control Section Controls 


Control Primary Primary Alternate 1 Alternate 1 Alternate 2 Alternate 2 
Pipeline Field Name Signal Name Field Name Signal Name Field Name Signal Name 
Bit # Meaning Meaning Meaning 
P32 Cycle Length (1) PLCLK_LEN (1) 
P31 Cycle Length (0) P_CLK_LEN (0) 
P30 Interrupt Enable P_INT_EN 
P29 Force Continue P_FC* 
If P29 = 1 then primary If P29 = 0 then alternate 1 
P28 Seq Instruction (5) P_SEQ_LINST (5) _ Interrupt Host P_INT_HOST 
P27 Seq Instruction (4) P_SEQ_INST (4) Sign Extend A_BUS P_SIGN_EX 
P26 Seq Instruction (3) P_SEQ_LINST (3) Initialize P_INIT 
P25 Seq Instruction (2) P_SEQ_INST (2) Load Interrupt Base Add P_LD_INT_BASE 
P24 Seq Instruction (1) P_SEQ_LINST (1) 
P23 Seq Instruction (0) PLSEQLINST (0) 
If P29 = 1 AND P28:27 != 11 then primary if P29 = 0 OR P28:27 = 11 then alternate 1 
P22 Test Select (3) P_TEST (3) Am29114 Instruction ( 3 ) P_INT_INST (3) 
P2t Test Select (2)  P_TEST (2) Am29114 Instruction ( 2 ) P_INT_INST (2) 
P20 Test Select (1)  P_TEST (1) Am29114 Instruction ( 1) P_INT_INST (1) 
P19 Test Select (0) P_TEST (0) Am29114 Instruction (0 ) P_INT_INST (0) 
P18 Load Operand Counter P_LD_CNT 
P17 Load Macro Op Reg P_LD_MAC_OP 
P16 Branch Field Enable* P_BRANCH_EN" 
P15 Branch Address (15) D_BUS (15) 
P14 Branch Address (14) D_BUS (14) 
P13 Branch Address (13) D_BUS (13) 
P12 Branch Address (12) D_BUS (12) 
P11 Branch Address (11) D_BUS(11) 
P10 Branch Address (10) D_BUS (10) 
PQ Branch Address (9) D_BUS(9) 
P8 Branch Address (8) D_BUS(8) 
P7 Branch Address (7) D_BUS(7) 
P6 Branch Address (6) D_BUS(6) 
P5 Branch Address (5) D_BUS(5) 
P4 Branch Address (4) D_BUS(4) 
P3 Branch Address (3)  D_BUS(3) 
P2 Branch Address (2) D_BUS(2) 
P 1 Branch Address (1) D_BUS(1) 
PO Branch Address (0) D_BUS(0) 





If separate control bits were provided for the FPP or PM, 
they could perform multi-cycle operations such as New- 
ton-Raphson division in the FPP or greater than 32 by 32 
bit multiplies in the PM, while remaining detached from 
the input and output buses during most of the multi-cycle 
operation. If this were done, the ALU could operate in 
parallel during such operations. The cost of doing this 
would be an additional 15 to 35 bits added to the microc- 
ode word width. These bits would get full use only during 
those situations that parallel calculations are possible. 


For this design it was decided to use a smaller microcode 
word by overlapping control bits for each of the three 
functional units. 


Data Path Selection: Only one functional unit (data 
path) in the data section is chosen in any one cycle. Bits 
65 and 64 select one of four options: 


Bit 
65 64 
0 0 ALU enabled 
0 1 PM enabled 
1 0 FPP enabled 
1 1 Special function 
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In the special function option, the FPP is enabled for 
calculation and the control bits are assumed to be set 
correctly for use by the FPP, but the output enable of the 
FPP is inactive with the ALU output enable active. The 
ALU is not enabled for calculation in the sense that its 
hold input is made active to prevent state change in the 
status or Q registers. 


This odd-looking combination is used to provide input 
operand parity checking for the FPP. The FPP does not 
have its own parity checking circuits, so with this arrange- 
ment the ALU parity checkers will be enabled by the 
active output enable onthe ALU. The FPP is still allowed 
to function and may complete its operation and store the 
result in its internal registers, while in the same cycle the 
input operand parity is checked by the ALU. The ALU 
state is left undisturbed by this operation. 


How useful is this scheme? It may Save a cycle once in 
a while, but mainly it illustrates the odd sort of opportuni- 
ties one may find to use up an otherwise wasted control 
code. 


ALU Path: When the data path select bits enable the 
ALU meaning for bits 63:38, bits 54 and 47 are used to 
select either the microcode or macroinstruction position 
and width fields. The macro supplied information is 
selected when these select bits are high. When the 
macro source is selected, the microcode position and 
width fields are unused. 


Bit 41 selects macro or micro status inputs for the ALU. 
Bit 40 selects whether the status output of the ALU is 
flow-through or registered. 


Bit 39 is used as a clock qualifier for the loading of the 
ALU external macro status register. 


Bit 38 directly controls the Borrow mode of the ALU. 


FPP Path: When the data path selects enable the FPP, 
the control bits shown directly manage the operation of 
the FPP as described by the Am29325 data sheet. Bit 52 
is used to enable the output of the FPP external “division 
seed” registered PROM. 


PM Path: When the data path selects enable the PM, the 
listed control bits are used as defined in the AmM29C323 
data sheet. 


Data Path Enabling: What does it mean to enable or 
disable one of the functional units? The control bits that 
are shared between each functional unit are either high 


or low every cycle, and they are connected to the ALU 
and multipliers all the time. There is no intervening logic 
that turns all the control bits “off” when a particular path 


is not selected. Each device sees a jumble of nonsense 
on its control lines whenever the control field meaning is 
intended for another device. Nonsense or not, each 
device will do whatever the control bits specify. 


Enabling a data path means making the output enable of 
the selected device active so that it drives the Y_BUS and 
is able to write calculation results back into the register 
file. In the case of the ALU, enabling also means that the 
ALU hold input will be made inactive so that state change 
of the ALU status and Q registers is allowed. Enabling 
one path implies disabling the other paths. 


For the PM and FPP, disabling means their output 
enables are inactive. It also means that the PM product 
register feed through pin is disabled by the control 
decode logic. Forthe FPP itmeans that both of its register 
feed through lines are disabled by control decode logic. 
These register feed through controls are disabled be- 
cause, if they are allowed to be active, it is possible forthe 
PM and FPP multipliers to feedback on themselves and 
begin to oscillate. This action would not damage the 
devices, but it could add to power consumption and 
system power plane noise. A simple prevention is just to 
disable the feed-throughs when the data paths are not 
selected. Note that the ALU has no internal feedback 
paths and does not need any similar treatment. 


Memory Control: Bits 37:33 are available at all times to 
control the Am29300 system memory. 


Bit 33 is the memory write enable control. 


Bits 35:34 select the source of the address for the 
memory. 


Bit 

35 34 

0 0 No memory address or operation is 
selected 

0 1 A_BUS datais used to address memory 

1 0 The A memory address counter is 
selected for address 

1 1 The B memory address counter is 


selected for address 
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Bits 37:36 select the following: 


Bit 
37 36 
0 0 Load counter A 
0 1 Load counter B 
1 0 Selected counter is incremented 
| 1 Selected counter is decremented 


- The increment and decrement commands have effect 
only when a counter is selected as the MA_BUS source. 
The load commands have effect only when the A_BUS is 
the selected source. 


Control Section Controls 


Figure 5-20 shows the bit definitions for the control 


section. 


Pipeline bits 32:31 control the length of each machine 
cycle. 


‘Bit 
32 31 
0 0 Normal cycle length 
0 4 1.33 x Normal cycle length 
1 0 1.66 x Normal cycle length 
1 1 2 xX Normal cycle length 
Bit 30 enables sequencer interrupts on a cycle by cycle 
basis. | 


Bit 29 is the Force Continue signal for the sequencer. 
When this bit is active, the sequencer will execute a 
continue instruction regardless of the state of the se- 
quencer instruction or test select lines. This effectively 
enables the alternate meaning forthe sequencer instruc- 
tion and test select fields. 


Bits 28:19 are normally the sequencer instruction and 
test select inputs. When Force Continue is active, the 
sequencer instruction field meaning changes. 


When Force Continue is active, bits 28:25 are used to 
control four individual functions. Bit 28 will send an 
interrupt signal to the host system. Bit 27 will enable the 
sign extension of data going from the D_BUS to the 
A_BUS. Bit 26 will force the control pipeline register to 
load data from the control store initialize register at the 
next active system clock. Bit 25 will enable the loading of 
the interrupt base address register. 


Bits 22:19 are used to control the sequencer test selec- 
tion. When an unconditional sequencer instruction is in 
effect or when the Force Continue bit is active, bits 22:19 
are used to control the Interrupt controller instruction. 


Bit 18 is used to load the macro operand counters from 
the macro opcode register. 


Bit 17 is used to load the macro opcode register. 


Bit 16 enables the three-state outputs on the branch field 
bits of the control pipeline register. If these outputs are 
disabled, then the sequencer, A_BUS to D_BUS trans- 
ceiver, or Interrupt Controller may drive the D_BUS. How 
a device is chosen to drive the D_BUS is explained inthe 
control decode logic description. It is only important to 
note that if bit 16 is active, the branch field outputs will be 
active and will have priority over any other driver on the 
D_BUS. 


Bits 15:0 are the branch address field to the sequencer. 
This field is also used to contain constants or masks. 
These may be used by the data section, sequencer, 
interrupt base register, or interrupt controller. It is a full 16 
bits long in order to allow for constants or masks that fill 
half of the 32-bit data path. This allows 32-bit microcode 
supplied masks to be formed with two microinstructions. 


Alternate Arrangements 


The microcode word size just defined for this system 
totals 92 bits wide. Having so many bits allows the 
flexibility to change the control over most of the 
machine’s functions on any or every cycle. But, this 
degree of control flexibility is not required for every 
application. The size of the control store may be reduced 
based on how the system is used most often. Following 
are afewcomments onwaysto rearrange and reduce the 
control store size. 


Current Control Bit Usage 


First let’s look at how the control bits are used in this 
design. 


Seven of the bits are used to control the selection of 
alternate field meanings (i.e., overlap control in bits 91, 
84, 77:76, 65:64, and 29). 


Eleven bits are used to control functions that are desired 
to operate in all cycles, independent of other system 
operations. These are the register file write and read 
enables (bits 69:66), memory controls (bits 37:33), and 
the cycle length controls (bits 32:31). 
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Eight bits generally do not change state frequently. Their 
existence in this design is aconvenience that reduces the 
need for control decode logic and adds system flexibility. 
These bits are 41:38, 30, 18:16. 


Three bit fields are used only with some _ instruction 
types. These are the position, width, and branch fields. 
Whenever a particular instruction does not use a field, 
those bits in the field are currently wasted in that in- 
struction cycle. 


Alternative Usage 


The bits that change infrequently could be replaced by . 


decode logic that provides these same control signals via 
set-reset flip flops. The flip flops would be controlled by 
overlapping set and reset commands with some other 
control store field. This would add to the decode logic 
complexity and would limit when the flip flops could be 
changed by restricting the control over them to certain 
instruction types. Since they change only infrequently, 
the requirement to use certain instruction types when 
setting or resetting them should not be a problem. 


Those bit fields that are limited to certain instructiontypes 
could be overlapped. An example might be to overlap the 
position and width fields with the branch address field. 
This would restrict branches to instructions that do not 
require the position or width information. 


When alternative field meanings are enabled, often the 
alternative definition does not make use of all the bits in 
the field. This presents the opportunity to overlap other 
control bits that may be valid in the same cycle as the 
alternate meaning of the field. 


For example, some of the infrequently-used control bits 
could be overlapped with the unused bits of the register 
C address when the primary meaning of the C address 
field is not active. When a two address instruction is 
executed, the address for the C register comes from the 
AorB address, thus leaving the microcode field for the C 
register address available for other functions. 


In another example, the bits in the position and width 
fields that are not used by the PM or FPP could be 
overlapped with other control functions when the alter- 
nate meanings for the field are in effect. An alternate 
branch address field might be placed in those bits to allow 
branch instructions in combination with FPP or PM 
operations without the need for the currently defined 
branch field. : 


Careful analysis of how each data path is used may also 


allow reductions through the elimination of controls that - 


are not needed. As an example: if the PM were used 


only in flow through mode, all the controls for register 
enables, flow through modes, and input multiplexers 
could be removed from the microcode word and those 
inputs to the PM tied to fixed voltage levels. If only two’s 
complement mode is used then an additional two bits 
may be eliminated. This would leave only four necessary 
control bits, the accumulator controls, rounding mode, 
and format adjust. This reduction might allow PM 
operations to be overlapped with some multiply-accumu- 
late operations in the FPP. 


By combining these reduction techniques, the following 
changes could be made: 


All of the eight infrequently used control bits could be 
moved to overlap with the C register address, with half in 
effect whenthe A address is substituted forthe C address 
and half in effect when the B address is substituted. 


The PM controls, except for flow though and two’s 
complement mode, may be moved to overlap with the 
position, width, and memory control fields. Also, the 
fourth data path select field may be changed to disable 
the memory controls and select the ALU — minus the 
position and width fields —to be active along with the PM. 
In this mode the PM flow through and two’s complement 
mode controls would be fixed with no flow through and 
two’s complement mode active. The ALU position and 
width inputs would be set to 0 and 31 respectively by 
control decode logic (unless these fields were selected to 
come from the macro opcode). 


The branch address field may be moved to overlap with 
the position, width, and memory control fields. When ever 
the sequencer instruction selects a branch operation, the 
position, width, and memory fields are disabled and the 
branch address meaning substituted. 


If all of these changes are made, the currently defined 
branch address field and infrequently used control bits 
may be eliminated, which would save 24 bits of microc- 
ode word width. This would reduce the word size to 68 
bits. 


This savings would come at the cost of allowing branch 
instructions only when the ALU instruction does not need 
position or width information from the microcode (this 
information may still come from the macro opcode regis- 
ter) and when the system memory is not being used. 
Further, a PM operation could not occur with a memory 
access in the same cycle. Also, with these changes it 
would be possible to control the ALU and PM concur- 
rently when the ALU does not need position or width 
information and when the PM operates on internally 
registered data. 
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There are many such combinations of microcode control 
field definition. Each one provides a different trade-off 
between word size and what operations may be concur- 
rent. Each one requires a different degree of complexity 
in the control decode logic. 


CONTROL DECODE 
What Is It Good For? 


The ideal microprogrammed system has a separate 
microcode control store bit for each control input that 
exists in the system. This kind of complete control over 
every aspect of the system directly from the control 
pipeline totally eliminates the need for decoding the 
meaning of any system control bits. It also requires a very 
large microcode word to manage most useful systems. 
So in the real world, most microprogrammed systems 
encode or overlap at least some control functions in the 
microcode word. 


Encoded control or not, each control input in the system 
requires valid voltage levels during each machine cycle 
if the system is to operate as expected. 


The control decode logic acts as the bridge between 
encoded or overlapped (i.e., sometimes unavailable) 
microcode control fields and the related control signals in 
the system. The control decode logic continuously pro- 
vides valid logic levels for those control signals that 
cannot be directly driven by the control pipeline register. 


If the control field for a particular function is encoded, the 
control logic translates the function codes into individual 
control signals. Where control fields are overlapped, the 
control logic may be used to disable logic affected by a 
control field when that field has a meaning different than 
that intended for the logic being disabled (i.e., when 
overlapped control is active). 


In some cases, control logic is used to prevent harmful 
conflicts between the meaning of different control bits, for 
example when two separate control fields affect the 
three-state enables on different buffers which may drive 
the same signal line. Certain combinations of control bits 
might enable both buffers in the same cycle causing 
contention between the buffers. Allowed to continue for 
long periods, this kind of contention may destroy the 
buffers. Control logic may be used in this situation to 
disable one or both buffers when the combination of 
controls affecting them would otherwise cause damage. 
In fact it is strongly recommended that this kind of 
problem always be avoided by designing the control 
decode logic to prevent such disasters. The alternative is 
to watch hardware melt because of a software mistake. 


Control Logic Description 


Some of the control logic function in this demonstration 
system has been distributed into the devices being 
controlled. This is done when a PAL is used to implement 
a function. A PAL generally has excess inputs and 
internal logic that may be put to use in decoding the 
meaning of encoded control fields( e.g. the memory 
address counters). The memory address counters are 
implemented from AMPAL22V10 devices and are shown 
in Figure 4-7. The control! for loading, incrementing, 
decrementing, and output enabling the counters is pro- 
vided directly from the encoded memory control field. 
The PALs internally decode the meaning of the control 
bits. 


When a device requires a decoded control signal, the 
signal must come from control decode logic that takes 
control pipeline bits as input and produces the needed 
control signal. In this system, the required control logic 
has been implemented in three AMPAL18P8B PALs. 
These PALs are fast to minimize the delay induced 
between the control pipeline register and the device 
controlled. The PALs also provide the convenience of 
having programmable output levels, either high or low 
active for each output, independent of other outputs. 


The block diagram for these PALs is shown in Figures 5- 
21 and 5-22. The logic definition files for these PALs are 
in Appendix M. 


The ALU output enable, ALU hold, and PM output enable 
are all direct results of the pipeline data path select bits. 


The pipeline controls for seed register output enable, PM 
flow through, and FPP flow through are gated by the 
appropriate data path selection so that each control 
signal is active only when the related data path is se- 
lected. 


The D_BUS to A_BUS direction of the D_BUS trans- 
ceiver is enabled by the register file A output’s being 
disabled in conjunction with the seed register output’s 
being disabled. 


The A_BUS to MD_BUS buffer is enabled by certain 
codes of the memory control field. 


The control store initialize register select is enabled by 
the combination of the pipeline Force Continue and the 
pipeline control bit for the initialize select. It is also 
enabled by the WCS_INIT™ signal from the host interface 
controller. Note that the initialize control is synchronous 
as used inthis system so that the initialize word is loaded 
only at the next active clock. 
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P_DSP (1) ALU_OE * 
P_DSP (0) ALU_HOLD 
P_OEA* PM_OE * 
P_SEED_OE SEED_OE* 
P_FTP FTP 
P_FP_FT (1) FP_FT (1) 
P_FP_FT (0) FP_FT (0) 
D_OER* 
P_MEM (3) A_MD_OE * 
P_MEM (2) INIT_MC * 
P_MEM (1) 
P_MEM (0) 
P_FC* 
P_INIT 
WCS_INIT * 
O9856A 5-21 
Figure 5-21. Control Decode Logic Part1 
P_BRANCH_EN* D_OET* 
PFC” SEQ_OED 
IEN * 
P_INT_INST (3) INT_CS* 
P_INT_INST (2) D_SIGN_EX 
P_INT_INST (1) 
P_INT_INST (0) 
P_SEQ_INST (5) 
P_SEQ_INST (4) 
P_SEQ_INST (3) 
P_SEQ_INST (2) 
P_SEQ_INST (1) 
P_SEQ_INST (0) 





Figure 5-22. Control Decode Logic Part 2 
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The D_BUS sign extend, Sequencer output enable, 
Interrupt controller instruction and chip select enables, 
and A_BUS to D_BUS enable are all direct results of the 
pipeline sequencer instruction, interrupt controller in- 
struction, branch enable, and Force Continue bits. 


The Sequencer output enable, ALBUS to D_BUS en- 
able, and interrupt controller chip select are used to 
control which device is allowed to drive the D_ BUS inany 
given cycle. These output enables are arranged in a 
priority with only one output allowed to be active in any 
cycle; including the branch field of the control pipeline. 


The highest priority output is the branch field. If it is 


enabled all other outputs are disabled. 


If the branch field is disabled, then the Sequencer D 
output is enabled if either a Continue or a Pop D instruc- 
tion is being executed. 


If neither the branch field nor the sequencer are enabled, 
then the interrupt controller may drive the D bus if the 
interrupt controller instruction is one of three read 
operations. 


If none of the above conditions exist to enable the other 


D_ BUS devices, then the A_BUS to D_BUS transceive 
path is enabled. 
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Note that the interrupt controller chip select is treated as 
both an instruction enable and as an output disable. The 
chip select is active whenever there is a valid interrupt 
instruction that would not cause a conflict with another 
driver of the D_BUS. This means that when there is a 
valid instruction, the chip select will be inactive only if a 
read instruction is selected and either the branch field or 
sequencer are already enabled on the D_BUS. If any 
other interrupt instruction is in effect, the interrupt control- 
ler does not drive its outputs. 


The above scheme for managing the access rights to the 
D_BUS may seem a bit complex but it allows great 
flexibility in movement of information over the D_BUS. 
Information may be moved between the interrupt control- 
ler and sequencer, interrupt controller and A_BUS, or 
sequencer and A_BUS. Information may be loaded into 
the interrupt base address register from the pipeline, 
sequencer, or A_BUS. Also, the pipeline may provide 
data to the sequencer, interrupt controller, interrupt base 
address register, or A_BUS. 
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SECTION 6 
System Timing and Critical Path Analysis 


DEFINITIONS 


The upper limit on system speed is set by the slowest 
signal propagation path in the system. 


The length of a signal propagation path is measured from 
the output of one register to the input of another register, 
where all registers are loaded by the same clock. 


The slowest signal path will be different for different 
control states. An example would be the selection of the 
ALU data path vs. the FPP data path. 


A signal path may be slower in the first cycle that control 
selects the path than it will be in a subsequent cycle that 
maintains the same path selection. This can be due to 
three-state enable or disable times being longer than 
normal propagation delays of the circuits involved. 
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CONTROL AND DATA PATHS 


In determining the maximum system speed, every signal 
path must be analyzed. This requires tracing every 
control signal and every data signal and totaling the delay 
induced by each component along the path from source 
register to destination register. Where parallel paths 
exist, the time delay for the slowest path is used. 


Most often, the critical (slowest) paths originate with the 
pipeline control register. In the data section the paths will 
end with data being loaded into the register file, an FPP 
or PM internal register, the system memory, oraD_BUS 
destination. In the control section the paths will end with 
loading of new control bits into the control pipeline 
register. 
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Figure 6-1. Data Section Timing Paths 
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Figure 6-2. Control Section Timing Paths 


Since the control section and data section operate in 
parallel, the slowest path in either section will determine 
the cycle length required for a specific operation. 


Figures 6-1 and 6-2 provide a block diagram view of 
significant signal pathways for both control and data lines 
in both the control and data sections. 


Referring to these figures as critical timing paths are 
discussed may help in following the timing analysis. 


In this and nearly any complex system, there are hun- 
dreds of pathways that must be traced in order to ensure 
finding allthe worst case delays. To go through allofthem 
here would require too much time and space. Many of the 
timing paths for this design have already been analyzed, 
and what appear to be the worst case paths will be shown 
here. 


WORST CASE PATHS 


Each case is shown in Table 6-1. The table is separated 
into several pages due to its length. It can be viewed as 
a long spreadsheet calculation in which the appropriate 
timing parameters that apply to each case have been 
selected and placed in the correct column. Only the worst 
case delay for each segment of a critical path is shown. 


Parallel but faster paths have been eliminated from each | 


case so that the total of the times listed for a case 
represents the minimum time in which a path can b 
traveled. | 


Case Definitions 


1. 


Basic flow-through calculation, data path. 


Data is moved from the register file through the 
ALU and back to the register file. The timing path 
begins at the control pipeline where the register 
file address for the A and B read operands 
appear after the clock to output delay of the 
control pipeline register. These addresses flow 
through the Am29827 buffer that forms one side 
of the register file address multiplexer. The 
address accesses the register file and one ac- 
cess time later the data operands are presented 
to the ALU. By this time the control signals for the 
ALU instruction have been stable long enough 
that the flow through time of the data in the ALU 
will be the slower path. Once data is onthe Y bus 
the last delay is the set-up time forthe registerfile 
before clock canoccur. Again, the control path to 
the register file (A port write address) is faster 
than the data path so the data path is the limiting 
factor. 


The total delay for this path is 96 ns. If the PM 
path is substituted for the ALU the delay would 
be 174 ns. If the FPP were substituted, the delay 
is 179 ns. So flow through calculations with 
either of the multipliers will require extended 
cycle length. : 
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Basic flow-through calculation, position control 
path. 


This case is the same as Case 1 except that a 
careful look at the control path for the position 
input to the ALU is taken. This path turns out to 
be 97 ns worst case. This is an example where 
the control path is a little slower than the data 
path. 


Flow-through calculation with address supplied 
by the Macro operand counter; counter output 
enabled same cycle. 


Again this path is similar to Case 1. The differ- 
ence is that the read addresses are assumed to 
come from the Macro operand counters. It is 
further assumed that the counters are selected 
during the cycle analyzed. This means that the 
output enable time of the counter must be added 
to the clock to output time for the pipeline bit that 
selects the macro opcode counter. 


This increases the delay path to 115 ns, indicat- 
ing that during the first cycle, in which a macro 
opcode counter is used as the address source, 
the cycle length will need to be extended. 


Flow-through calculation with address supplied 
by the Macro operand counter; counter output 
enabled prior cycle. 


This case is a comparison with Case 3, where 
the Macro operand counter was output enabled 
in the previous cycle. The counter delay is thus 
limited to the clock to output delay of the 
counter. This reduces the cycle time require- 
ment to 90 ns. So, sequential register file 
address cycles, using an operand counter can 
be completed within the normal cycle time. 


First cycle of FPP Newton-Raphson division, 
seed value load. 


In this case the critical path starts at the control 
pipeline clock to output delay, and then goes 
through the control decode logic that enables the 
output of the Seed register. In this case it is 
assumed that the seed value is multiplied and 
storedin an FPP internal register to complete the 
first cycle of a Newton-Raphson division. This 
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requires a total of 169 ns. Note that if the seed 
value had simply been moved into the input 
register of the FPP, the total delay would have 
only been 73 ns. 


Memory read with address from the register file, 
selected by microcode. 


This is a simple memory read with the time 
starting at the pipeline clock to output delay, 
followed by the address mux, register file ac- 
cess, A_BUS to MA_BUS buffer, memory, and 
register file data set-up time. The total time 
comes in at 99 ns, just under the desired 100 ns 
basic cycle time. 


Memory read with address from a memory 
address counter. 


Here the access time of the register file is essen- 
tially traded for the output enable time of a 
memory address counter. The total delay only 
improves to 94 ns, but there is a big advantage 
in the fact that for a sequential access the CPU 
did not need to calculate a memory address. 
This will save at least one cycle. Also, it is 
possible to overlap a memory read from an 
address counter with a calculation cycle in the 
CPU. 


Memory write with data from register file, se- 
lected by operand counter. 


Inamemory write case, time is saved by needing 
only to meet the data set-up time of the memory 
rather than the memory access time plus the 
register file set-up time, as would be the case in 
a read operation. In this case the time gained is 
traded for the time required to output enable an 
operand counter. Even so, the total time is still 
94 ns. 


Move register file data to interrupt controller or 
sequencer, data selected by operand counter. 


Here again, the long delay path of using a macro 
opcode counter as the register file address 
source is used. Even with the output enable 
delay of the counter in addition to the pipeline 
clock to output time, the total delay comes in at 
89 ns. 
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10. Move sequencer or interrupt controller data to 
register file: 


Inthe reverse of the above case, the time to get 
data from D_BUS is similar to the time in Case 9 
to access data from the register file. The big 
delay here is the need to move the data from the 
A_BUS, through the ALU and back to the regis- 
terfile. Not having adirect path to the Y_BUS has 
cost a good bit of time. The total comes in at 
127 ns. Fortunately this type of data move is 
not likely to be a commonly executed cycle. 


11. Sequencer branch, conditional or unconditional. 


In this case much of the delay is in the pipeline 
clock to output time for the branch field enable 
bit, cascaded with the output enable time of the 
branch field in the control pipeline register. This 
is followed by the branch address flow through 
time of the sequencer and the access time of the 
control store. Even with all the delay, this path is 
significantly faster than most of the data section 
paths. The total time is 84 ns. 


12. Sequencer interrupt or trap cycle. 


Inthis case the pipeline output doesn’tturn outto 
be in the main delay path. The interrupt starts at 
the clock to output delay of the trap logic where 
the interrupt request is generated. The se- 
quencer then responds with interrupt acknowl- 
edge, which in turn output enables bit 3 of the 
interrupt vector from the trap logic. The interrupt 


6-80 


vector then accesses the control store. The total 
for this cycle is 81__ ns. 


13: Sequencer branch to macro opcode specified 
instruction. 


Here the initial delay is the clock to output delay 
of the macro opcode register, followed by the 
access time of the map RAM. Nextis the branch 
flow through time for the sequencer and the 
access of the control store. This cycle comes in 
at 85 ns. 


FINAL RESULTS 


Several cases were shown here to help give an idea of 
how fast the system is for different instructions. These 
cases were some of the worst identified during the critical 
analysis of this design. All but three of the cases shown 
fit within the desired 100 ns basic clock cycle. Two of 
the cases would only require acycle 1 1/3 times normal. 
Case 5 officially needs a double length cycle. 


As noted in the discussion of Case 1, both the PM and 
FPP require much longer cycles for flow through calcula- 
tions. If the PM and FPP are used in clocked multiply 
mode for sequential pipelined multiplies, as would occur 
in array calculations, the cycle time can be significantly 
reduced. In clocked multiply mode the PM or the FPP 
requires only 100 ns cycle times. 


With a dynamically variable clock cycle length, this sys- 


temcanrun most instructions atthe basic 100 nscycle 


rate, but will still handle the occasional extended execu- 
tion time instructions. 
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Am29300 Demonstration System 


Table 6-1A 


Data Path Element 
Parameter Description 


Control Store/Register - 
Am9151-50 
Clock to Output 
OE to Output Valid 
Synchronous! 
| to Clock Set-up 
Address to Clock Set-up 


Control Decode Logic - 
AmPAL18P8B 
Input to Output 


Macro Opcode Register - 
Am29818-1 
Clock to Output 
Input to Clock Set-up 


Macro Operand Counters - 


AmPAL22V10A 
Clock to Output 
Input to Clock Set-up 
OE to Output Valid 


Reg File A or B Read 
Add Mux - Am29827A 
Input to Output 
OE to Output Valid 


Reg File C Write Add Mux - 


AmPAL18P8Q 
Input to Output 


Symbol 


Tpkhdqv1 
Tgldqv 


Tivpkh 
Tavpkh 


Tpd 


Tpd 
Ts — 


Tco 
Ts 
Tea, Ter 


Tph 
Tzh 


Tpd 





Signal Path Timing Analysis 


Worst Case Time Delay in Nanoseconds, Over Commercial Operating Range 


Value | Case 


15 
20 


25 
30 


15 


11 


15 
20 
25 


35 


Case 
1 2 3 4 


15 15 15 


15 
25 


5 


15 


15 





6 7 8 


15 15 15 


25 


Case | Case | Case | Case | Case | Case | Case | Case | Case 


9 10 11 


15 15 15 
20 
30 


15 


29 


Case 
12 


30 


Case 
13 


30 


11 
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Table 6-1B 


‘Data Path Element 


Parameter Description 


Reg File B Write Add Mux - 
AmPAL22P10AL 
Input to Output . 


ALU Position & Width Mux - 
AmPAL22P10AL 
Input to Output 


Register File - Am29334 
Address to Read 
Data Output 
OE to Output Valid 
OE to Output Three-state 
Data Set-up 


ALU -- Am29332 
Data A or B to Y Parity 
Instruction to Y Parity 
Width to Y Parity 
Position to Y Parity 


Parallel Multiplier - — 
Am29C323 
Unclocked Multiply X or Y 
to P Parity 
Clocked Multiply, 
Cycle Time 
Clocked Multiply, 
Data to Clock Set-up 
Clocked Multiply, 
Clock to Output 


Symbol 


Tpd 


Tpd 


Access 
Turn-on 
Turn-off 
Tds 


Tmuc 


Tmc 


Tsxy 


Tpdpp 


Worst Case Time Delay in Nanoseconds, Over Commercial Operating Range 


Value | Case | Case | Case | Case | Case | Case | Case | Case | Case | Case [| Case | Case 


25 


25 


24 
20 
16 


42 
53 
40 
48 


150 


| 125 


20 


40 


| 3 4 5 6 
25 
24 24 | 24 | 24 
9 9 9 9 9 
42 42 | 42 
48 


rd 


8 


24 


9 


24 


10 


42 


11 


12 


Case 
13 


9 YALdVHO 
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Table 6-1C 


Data Path Element 
Parameter Description 


Floating Point Processor - 
Am29325 
Unciocked Operations 
Clocked Operation 
Clocked Multiply, 
Data to Clock Set-up 
Clocked Multiply, 
Data to Clock Set-up 


FPP Seed Register - 
Am2920 & Am27S25 
OE to Output Valid 


FPP External Status 
Register -AmMPAL22V10A 
Clock to Output 
Input to Clock Set-up 


Macro Status Register - 
Am29818-1 
Clock to Output 
Input to Clock Set-up 


Memory Address or 
Data Buffer -Am29827 
Input to Output 
OE to Output Valid 


Memory Address Counters - 
AmPAL22V10 
Clock to Output 
Input to Clock Set-up 
OE to Output Valid 


Symboi 


Tsd1 


Tsd2 


Tzh 


1 TCO 


Ts 


Tpd 


_{Ts 


| Tph 


Tzh 


Tco 
Ts 
Tea, Ter 





Worst Case Time Delay in Nanoseconds, Over Commercial Operating Range 


Value | Case | Case | Case | Case | Case 
5 


| 125 


100 


4 
! 


104 


15 
20 


11 


10 
17 


25 
30 
35 


1 


2 re 


3 


104 


35 


Case | Case | Case | Case j Case | Case 


6 | 10 


10 


7 


35 


_8 


10 


a. 


Case | Case 


12 


13 





SdI0N UOIeDIddy/SsasiHy 


9 HALdVHO 


y8-9 


Table 6-1D 





Data Path Element 
Parameter Description 


Symbol 

Memory - Am99C 165-35 

Chip Enable Access Time | Telqv 

Address Access Time Tavav 

Chip Enable to 

Output Disable Thz 

Write Pulse Width Twlwh 

Data to Write End Set-up =| Tdvwh 

Address to Write 

End Set-up Tawwh 

Write to Output Disable Twiqz 
D_BUS - A_BUS 


Transceiver - Am29853 
Input to Parity Output Tpd 
OE to Output Valid Tzh 


D_BUS - A_BUS Parity 

Buffer - Am29862 
Input to Output Tpd 
OE to Output Valid Tzh 


Map RAM - Am9150-25 
Address to Data Taa 


Interrupt Controller - 
Am29114 | 
Clock to Interrupt Request 
Instruction Enable to 
Data Output 
Data in to Clock Set-up 
MINTA* to Vector OE 


Trap Logic -AmMPAL22V10A 
Clock to Output Tco 
Input to Clock Set-up Ts 
OE to Output Valid Tea, Ter 


Worst Case Time Delay in Nanoseconds, Over Commercial Operating Range 


Value | C 


35 
35 


20 
30 
20 


30 
10 


25 


41 


30 
10 
19 


15 
20 
25 


ase | Case 


1 


2 


Case | Case | Case | Case 


3 


4 


5 


6 


35 


Case | Case | Case | Case | Case | Case | Case 


7 


35 


8 


20 


Q 10 11 12 13 


15 15 


25 


10 


15 
25 
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Table 6-1E 


Data Path Element 
Parameter Description 


Sequencer - Am29331 
Branch Input to Y Output 
Instruction to Y Output 
Instruction to D Output 
Force Continue to 
Y Output 
Interrupt Request to 
Interrupt Ack 
OE D to D Valid 


Minimum Cycle Time 
per Case 


Symbol 


Worst Case Time Delay in Nanoseconds, Over Commercial Operating Range 


Value | Case | Case | Case | Case | Case | Case | Case | Case | Case | Case | Case | Case 


19 
25 
31 


2 


1 
2 


om 


Ol as 


1 | 2 3 4 5 6 7 8 9 10 
96 | 97 | 115 | 90 | 169 | 99 | 94 | 94 | 89 | 127 








11 12 


19 
11 
84 81 


Case 
13 


19 


85 
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Am29300 Demonstration System Signal Path Timing Analysis 


Table 6-1F 
Case Definitions 


ase Definitions 


1. Basic flow through calculation, data path. 
Pipeline, Tco; Address Mux, Tpd; Register File, Tod; ALU, Tpd; Register File, Set-up. 


2. Basic flow through calculation, position control path. 
Pipeline, Tco; Position Mux, Tpd; ALU, Tpd; Register File, Set-up. 


3. Flow through calculation with address supplied by operand counter; counter output enabled same cycle. 
Pipeline, Tco; Operand Counter, Tea; Register File, Tpd; ALU, Tpd; Register File, Set-up. 


4. Flow through calculation with address supplied by operand counter; counter output enabled prior cycle. 
Pipeline, Tco; Operand Counter, Tco; Register File, Tod; ALU, Tpd; Register File, Set-up. 


5. First cycle of FPP Newton-Raphson division, seed value load. 
Pipeline, Tco; Control Decode, Tpd; Seed Register, Tzh; FPP Internal Register Set-up, Tsd2. 


6. Memory read with address from the register file, selected by microcode. 
Pipeline, Tco; Address Mux, Tpd; Register File, Taa; Memory Address Buffer, Tpd; Memory, Taa; Register File, Set-up. 


7. Memory read with address from a memory address counter. 
Pipeline, Tco; Control Decode, Tpd; Memory Address Counter, Tzh; Memory, Taa; Register File, Set-up. 


8. Memory Write with data from register file, selected by operand counter. 
Pipeline, Tco; Operand Counter, Tea; Register File, Taa; Memory Address Buffer, Tpd; Memory, Write Set-up. 


9. Move register file data to interrupt controller or sequencer, data selected by operand counter. 
Pipeline, Tco; Operand Counter, Tea; Register File, Taa; A to D Bus oa Tpd; Interrupt Controller, Data Set-up. 


10. Move sequencer or interrupt controller data to register file. 
Pipeline, Tco; Control Decode, Tpd; Sequencer, OED to D; D to A Bus Xcever, Tpd; Parity Buffer, 1, Tp: ALU, Tpd; Register File, ae 


11. Sequencer branch, conditional or unconditional. 
Pipeline, Tco; Pipeline Branch Field, Tzh; Sequencer, D to Y; Control Store, Address Set-up. 


12. Sequencer interrupt or trap cycle. 
Trap Logic, Clock to INTR; Sequencer, INTR to INTA; Trap Logic, Tea; Control Store, Address Set-up. 


13. Sequencer branch to macro opcode specified instruction. 
Macro Opcode Register, Tco; Map RAM, Taa; Sequencer A to Y, Control Store, Address Set-up. 
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SECTION 7 
Physical Issues 


ELECTRICAL LAYOUT ISSUES FOR 
POWER SUPPLY 


The TTL compatible, bipolar, Am29300 family compo- 
nents all use internal ECL circuitry with TTL compatible 
I/O buffers. 


Each part has a large number of output buffers due to the 
32-bit output bus, plus various status outputs. 


These two facts can make the real world interesting. 


When a large number of the output buffers switch simul- 
taneously, the local Printed Circuit Board (PCB) power 
and ground, and the chip internal power supply lines can 
experience significant noise transients. 


This power supply noise can couple into the internal 
logic’s ECL VCC pins. Since the internal ECL circuitry is 
referenced to the ECL VCC, the power supply noise can 
cause short duration shifts in the threshold levels of the 
internal logic. 


Due to the way ECL circuitry operates, it has much 
smaller noise margins than equivalent TTL circuits. The 
threshold shifts result in lower than normal noise margins 
in already sensitive high speed circuits. These reduced 
noise margins can result in noise-induced logic errors. 


It is, therefore, very important to provide very good power 
distribution and decoupling in a system using the 
Am29300 family. It is strongly suggested that a multi- 
layer PCB be used to provide power and ground planes. 
It is also important to minimize coupling between the 
TTL and ECL VCC pins of any Am29300 bipolar device. 
This can be done in part through good power supply de- 
coupling. 


An additional way to decouple the ECL and TTL VCC pins 
is to introduce inductive isolation. The simplest way to do 
that is to place a cutin the VCC plane that separates the 
ECL supply pins from the TTL pins. This produces a 


longer electrical path between the pins, which adds 
inductance between the pins. This inductive isolation will 
significantly reduce noise coupling. 


Some suggested PCB layouts for use with the Am29300 
family are shown in Figures 7-1a and 7-1b. The images 
are negatives where black indicates an absence of metal 
inthe VCC plane. : 


Although significant noise can also occur onthe TTL and 
ECL ground lines, the ECL circuits are much less sensi- 
tive to this noise. Attempting to isolate the TTL and ECL 
ground pins from each other can create more problems 
than it solves. Any isolation will reduce the noise in the 
ECL circuitry and thereby make the chip internal ECL 
ground “different” from the TTL ground. This can reduce 
the noise margin in the ECL to TTL conversion logic, 
introducing potential for noise induced errors. Itis recom- 
mended that no isolation between grounds be used. 


DECOUPLING CAPACITORS 


An added help in providing local VCC to ground decou- 
pling is available in the form of under-chip capacitors. 


Special capacitors for PGA device packages have been 
developed by Rogers Corporation, Q-PAC Div., 2400 
South Roosevelt St., Tempe, AZ. 85282. 


SOCKETS 


Whenever high pin count, expensive VLSI components 
are usedinasystem, many hardware designers prefer to 
have the devices in sockets. This allows easy removal for 
repairs or upgrades and provides an additional test point 
in the system. 


Sockets for the Am29300 family are available from Augat 


Corporation, Interconnection Component Div. 33 Perry 
Ave. Attleboro, MA. 02703. 
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SECTION 8 
Conclusion | 


There are many ways to skin a cat and surprisingly, 
many more ways to build a computer. This application 
note has tried to guide the reader through just one 
simple implementation. The author hopes some of the 
reasons behind the design choices in a microprogram- 
med computer design were made clear during the course 
of the description. 


Aside from some general notions about how a micropro- 
grammed system works, the reader should walk away 
having noted the following thoughts: 


This designis a full 32-bit processor capable of executing 
a full 32-bit add, barrel shift, logical, integer multiply, or 
even floating point multiply every 100 ns to 133 ns. That 
is a7 to 10 Million Instructions Per Second (MIPS) rate, 
which is (loosely) comparable to 7 times the performance 
of a VAX 11/780. 


For all that computing horsepower, the real core of this 
machine is made from only 6 chips: the Am29300 family 
of computer building blocks. That’s an incredible degree 


of integration as compared with previous approaches to 
high performance microprogrammed computer design. 


Most of the logic surrounding the Am29300 family com- 
ponents is not required. The additional logic is used to 
add system flexibility and to show off different aspects of 
microprogrammed design. Very little glue is needed to 
hold this family together. 


There is very little in the way of standard SSI logic 
used. Virtually all the MSI and SSI level logic functions 
were incorporated into Programmable Array Logic. 
This shows the versatility and integration that PALs can 
provide. 


Due to use of Serial Shadow Registers throughout the 
system, there is a reasonable hope that enough of the 
system state can be read and controlled so that debug- 
ging in the factory or field will be simple. This access to 
the internal structure of the machine is gained with very 
little “excess” logic. 


This application note, augmented by 60 pages of PAL 
and Am29PL141 definition files is available as a 
separate booklet; Publication No. O9856A. 
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Product Application 


(a) OPCODE (?) SCC (1) DESINATION (5) SOURCE 1 (5) SOURCE 2 (13) 
Le | 


The fast way to build 


a RISC processor 


A family of 32-bit VLSI ICs yields 
reduced instruction-set computers 
with a variety of architectures 


Dhaval Ajmera and Cheng-Gang Kong 
Production Planning and Development Engineers 
Microprogrammable Processes 
Advanced Micro Devices, Inc., Sunnyvale, CA 


entral processing units with re- 
duced instruction sets fall into 
two categories. Single-chip ver- 
sions are champion performers, but 
their fixed instruction sets mean that 
software compatibility can be a prob- 
lem. Others are built from an army 
of discrete components and small-, 
medium-, and large-scale ICs (SSI, 
MSI, and LSI) and so suffer from 
high chip counts, long interchip de- 
lays, and great power dissipation. 
A good compromise between the 
two is a team of a few very large- 






DELAYED 
BRANCH 


scale IC (VLSI) parts—namely, the 
bipolar Am29300 and CMOS Am- 
29C300/familiesof VLSI building 
blocks (see box, “VLSI RISC’’). By 
using these families, it is possible to 
adapt an operating system and in- 
struction set to a reduced-instruc- 
tion-set computer (RISC) architec- 


‘ ture while maintaining software 


compatibility. 

As a family, the 29300 can support 
the extremely fast cycle time of 80 
ns,and both it and the29C300 group 
have a 32-bit fixed word length. That 


KEY 
SCC = SET CONDITION CODE 
IMM = IMMEDIATE 





Fig. 1. The RISC word for both the 
DELAYED Berkeley and the AMD reduced 
BRANCH instruction set is fixed at 32 bits (a). 


In the AMD RISC hardware, the 


o 


KEY pipeline structure consists of a 


simple, two-level instruction-fetch- 
and-execute configuration (b). 


IF = INSTRUCTION FETCH 
EXE = EXECUTION 





EXE 


Reprinted with permission from Electronic Products, Vol. 29 No. 12, November 17, 
1986. Copyright 1986, Hearst Business Communications Inc. 
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word length affords high precision 
for arithmetic operations as well as 
a wide bandwidth for memory and a 
large (4-gigabyte) addressing capa- 
bility for virtual-memory operations. 

Each family member fulfills a dis- 
tinct function, allowing the RISC 
designer considerable freedom to 
configure them in a variety of archi- 
tectures. Because, for example, the 
Am29334 register file building block 
is functionally separate from the Am- 
29332 arithmetic logic unit (ALU), 
several Am29334 can be used to 
vary the size of the register file as 
required. In addition, data from the 
registers can be shared by other par- 
allel devices besides the ALU. 

The high level of integration of the 
29300and 29C300family members fa- 
vors higher performance because in- 
terchip delays are shorter. Also, sys- 
tems need fewer and smaller boards 
to mount a lower parts count, and 
less power is dissipated—both fac- 
tors that tend to reduce costs. 

The AMD RISC architecture 
closely resembles the RISC I devel- 
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VLSI RISC 


A reduced-instruction-set processor 
could be designed onto a custom VLSI 
chip—for a price. Or it could be con- 
structed from numerous, less integrated 
ICs—in many manhours. The golden 
mean, however, is to turn to already 
available general-purpose VLSI building 
blocks, for these simplify the design job 
yet can be obtained off the shelf. The 
Am29300 family from Advanced Micro 
Devices in Sunnyvale, CA, includes the 
32-bit arithmetic logic unit, the 32-bit 
register file, and the bounds checker 


oped at the University of California 
at Berkeley, which has 33 instruc- 
tions. Basic to both architectures is 
a fixed instruction format. 

Every instruction word is 32 bits 
wide (see Fig. la). Its op code occu- 
pies a field of 7 bits. Three fields 
totaling 23 more bits specify two 
source operands and a single des- 
tination. These fields are always in 
the same position in the instruction 
word—an arrangement that makes it 


INSTRUCTION REGISTER —_ 


OP-CODE ENCODER 
(PLA) 


7-BIT INSTRUCTION 
CODE TO ALU 


PROGRAM 
COUNTER 
(PC) 


MUX 



















BOUNDS 
CHECKER 
(AM29337) 


ADDRESS 
REGISTER 


ADDRESS 
GENERATION 
ae ee 


REGISTERS 
(4  AM29334) 


MUX-A 


ALU 
(AM29332) 


DATA OUT 
REGISTER 


MUX-B 






ADDRESS BUS DATA BUS 


CONSTANT 
GENERATOR 


Fig. 2. The AMD RISC system includes a 
set of four AmM29334 registers and an 
Am29332 ALU, which derives its 7-bit 
op-code controls from a PLA. The Am- 
29337 bounds checker identifies all 
memory references to the file registers. 


needed to build the RISC described in the 
accompanying article. 

sso 55S 
The Am29332 ALU is housed in a 168- 
pin grid array and sells for $495 each in 
100-unit quantities. The Am29334 four- 
port, dual-access register file is pack- 
aged in the 120-pin grid array and sells 
for $180 each in 100-unit quantities. The 
Am29337 bounds checker comes in 28- 
pin ceramic DIP and is priced at $22 in 
100-unit quantities. Other building blocks 
in the Am29300 family are available. 


relatively simple to decode the op 
code in parallel with the operand 
access. 


A two-level pipeline 

The pipeline of the AMD RISC 
is a simple, two-level structure. One 
level fetches an instruction while the 
other is executing the instruction 
fetched immediately beforehand 
(see Fig. 1b). 

This concurrency, however, cre- 
ates difficulties with branch instruc- 
tions. A conditional branch instruc- 
tion cannot make its condition avail- 
able until it has been executed. 
Therefore, the instruction fetched 
during its execution might not be the 
correct one. 

To circumvent this pipeline lock- 
step dependency, a method called 
delayed branch is used. A code re- 
organizer (a program) rearranges 
the sequence of instructions so that 
the one immediately following the 
branch instruction is always exe- 
cuted despite the branching condi- 
tion (see Fig. 1b again). In 9 out of 
10 cases, a useful operation can be 
inserted. The rest of the time a NOP 
fills in. In other words, whatever the 
result of the branch instruction, it is 
executed only after an intervening 


ELECTRONIC PRODUCTS / November 17, 1986 


otes 








6-93 











CHAPTER 6 
Articles/Application Notes | 








6-94 


instruction has been dealt with. 
Exceptions are another pipeline 
hazard. When one occurs, the pipe- 
line contents are duplicated by three 
registers in the program counter 
unit. This unit is routed to the ALU 
through the A multiplexer (see Fig. 
2)—a feature that allows the return 
address to be saved when a call in- 
struction is executed. During excep- 
tion handling, this path also makes 


it possible to save the contents of the 


three program counter registers and 
to use them to restart the processor. 









INCOMING 
PARAMETERS 


LOCAL 
OUTGOING 
PARAMETERS 


GLOBAL 






Ris 
Rig 
Ro} 
R29 


Fig. 3. The register window of the 
AMD RISC is functionally divided 
into four sections (a). Every proce- 
dure of the program shares the 10 


global registers (b). 


GLOBAL 
(SHARED BY ALL 
PROCEDURES) 


The instruction set enables con- 
stants to be formed through the in- 
struction word directly. Before a 
constant can be fed into the ALU, 
however, some data has to be re- 
routed to generate it. This rerouting 
is done by the constant generator, 


' which in essence uses 32 two-input 


multiplexers to produce the proper 
constant. The result is then fed via 
the B multiplexer to an ALU input. 

The control section of the AMD 
RISC is relatively simple (see Fig. 
2 again). All the control signals are 


derived from the instruction’s 7-bit 
op code through a programmable 
logic array (PLA). The Am29332 is 
a 32-bit-wide ALU that performs all 
arithmetic and Boolean operations. 
A high data-transfer rate is provided 
by a powerful, orthogonal instruction 
set. To enhance system performance, 
the device also features a 64-bit-in, 
32-bit-out funnel shifter, as well as 
a 32-bit barre] shifter and a priority 
encoder. 


PROCEDURE A 





R22, 


R31, 


The Am29334 register file is a 
four-port, dual-access file that can be 
used to implement a distinctive fea- 
ture of the Berkeley RISC—its so- 
called overlapped register windows. 
This overlapping improves the speed 
at which the procedures (or subrou- 
tines) in an application program can 
pass parameters among themselves 
and the main program in a call-re- 
turn sequence. Berkeley researchers 
developed the technique after find- 
ing that parameter passing is one 
of the most time-consuming events 
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PROCEDURE B 


Integrated Circuits 


in the execution of high-level lan- 
guages. 

Four Am29334s, with the aid of 
some SSI and MSI chips, provide 
seven register windows and 10 global 
registers. Altogether, they easily fit 
onto a standard hex card. 

One register window is allocated 
to each procedure. Each window 
consists of 32 registers; thus at any 
time just 32 registers are visible to 
the currently executing procedure. 


PROCEDURE C 


OVERLAP 





The 32 are functionally partitioned 
into four sections: 10 global and 10 
local registers, as well as 6 apiece for 
incoming and outgoing parameters 
(see Fig. 3a). (In the Berkeley 
RISC, there are 138 registers 
grouped into 8 register windows. ) 
The 10 global registers (R.. to 
R., ) are shared by every procedure 
of the program (see Fig. 3b). They 
are used primarily for globally ref- 
erenced items such as a system’s 
commonly applicable constants. 
The 10 local registers (Ry, to 





Integrated Circuits 


‘10101’, 
WY 
RS3-RSg 
5 RS 4-RSo 
A B 
COMPARATOR 
B>A 
(a) 


1) FOR Rg TO Ry}: 


CWP: 


CWP RS4RSg _+ 


RS3-RSp 


3 INCREMENTER  CINK=RSy 
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XXX 0000 
Y_YYYY 
ENN 2) FOR Ro TO R3}: 
111 0000 
RS3RSo + _YYYY 
1 YYYY 
MAIN MEMORY 





CWP = CURRENT-WINDOW POINTER 


] 7 
D D KEY 
go) mux 2 RS = REGISTER SPECIFIER 
7 


ADDRESS TO 
REGISTER FILE 


Fig. 4. The AMD and Berkeley RISC register numbering are 1’s 


complements of each other (a). Also, either procedure can only be 
translated into the other if they are mapped one on one (b). Both the 


1’s complementing and the mapping are simple operations. 


R,;), dedicated to the procedure it- 
self, store local variables. 

Six registers (Ry to R;) accept 
incoming parameters from the call- 
ing procedure for use by the called 
procedure. They are also used to re- 
turn results from the called to the 
calling procedure. 

When the called procedure in turn 
summons another, it puts its outgo- 
ing parameters in six registers (Rj, 
to R.,) that then overlap the six in- 


coming-parameter registers of this 
last procedure. 

With such a register organization, 
parameters can be rapidly trans- 
ferred between procedures, as the 
three register windows in Figure 3b 
illustrate. When procedure A calls 
procedure B, all the parameters pass 
through the outgoing-parameter reg- 
isters of A to become the incoming- 
parameter registers of B, which can 
operate on these parameters without 


32-Bit Computer Performance Benchmarks 


Benchmark 


E-string search 
F-bit test 
H-linked list 
K-bit matrix 
I-quicksort 


Ackerman (3.6) 
Recursive Q sort 


Puzzle (subscript) 
Puzzle (pointer) 
SED (batch editor) 

| Towers of Hanoi (18) 
Average times faster 


Typical 32-bit 
Berkeley RISC 1 superminicomputer 

(ms) (ms) 

0.59 

0.29 

0.12 

1.29 
151.2 

5,120 

1,840 

9,400 

4,160 

5,610 

12,240 

1 





LOWER BOUND ——-————» 


UPPER BOUND —-—-———> 






ON-CHIP 
REGISTER FILE 


(b) 


accessing the stack memory. The 
same principle applies when B calls 
C. When C finishes, the results re- 
turn through the outgoing parame- 
ters of B (or incoming of C). In turn, 
B also returns its results through the 
outgoing parameters of A. 

The register numbering used in 
the AMD RISC for the windowing 
scheme is the 1’s complement of its 
Berkeley RISC counterpart, a con- 
vention easily implemented with 
simple address-generation logic (see 
Fig. 4a). (A one-to-one mapping still 
remains between these two proces- 
sors after this numbering change.) 

The address generation logic maps 
any register number greater than 21 
into the global register. The mapping 
is done by appending the lower 4 bits 
of the register specifier to three 1s. 
This operation maps it to a high ad- 
dress in the register file. 

To generate the address of a local 
register, the pointer to the current 
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window (logically a 7-bit register) 
is added to the register specifier. The 
current-window pointer is the base 
pointer for the currently visible reg- 
isters. It is advanced to the next win- 
dow base pointer when a call instruc- 
tion is executed; it is restored to the 
previous window base pointer when 
a return is executed. Since each reg- 
ister window is offset from the pre- 
vious window by 16 registers (due to 
the overlap illustrated in Fig. 3b), 
the lower 4 bits of the current-win- 
dow pointer are always zero. There- 
fore, an incrementer at the fifth bit 
position of this pointer can be used 


to add in the register specifier. Thus 
connecting the fifth bit of the register 
specifier to the carry-in of the cur- 
rent-window pointer’s incrementer 


_ generates the proper address for reg- 


isters 0 to 21. 

The comparator generates the 
proper select signal to gate the ap- 
propriate address (global or local) 
to the register file. With the pro- 
jected 80 ns of the combined propa- 
gation delay of the Am29332 and 
Am29334, a 100-ns system cycle time 
can be easily obtained. 

The register file, part of the sys- 
tem’s run-time stack, is mapped into 


RISC’s minimalist philosophy 


A new style of computer architecture has 
Stirred a lot of attention recently. It's 
called RISC, for reduced instruction-set 
computer. Examples of it are the Univer- 
sity of California at Berkeley's RISC | 
and RISC I], IBM’s 801 project, and Stan- 
ford University’s MIPS (for microproces- 
sor without interlocked pipe stages). 


The time-honored route in system de- 
sign has been to leverage on progress in 
IC technology by increasing the complex- 
ity of computer architecture, with the 
goal of narrowing the “‘semantic gap’ 
between the high-level languages of pro- 
gramming and the bit languages of ma- 
chines. Complex instruction-set com- 
puters, or CISCs, are one result. But the 
side effects are unpleasant—longer de- 
sign times, more numerous design er- 
rors, and inconsistent implementations. 


This outcome triggered an about-turn 
in favor of simplicity. RISC designers try 
to select only the most frequently used, 
primitive instructions and to execute 
them very fast. Some of the main archi- 
tectural design principles of the RISC 
are: 

e Execute one instruction per cycle. 

Program traces show that the most 
heavily used instructions are quite primi- 
tive. They also execute in one cycle. 
Hardwiring instead of microprogramming 
them enhances overall performance by 
eliminating the overhead incurred in mi- 
crocode interpretation. The lengthy, 
highly complex, and infrequently sum- 
moned instructions provided by the CISC 
but omitted on the RISC can be imple- 


mented by software subroutines. 


e Use a fixed instruction format. 

A fixed instruction format greatly sim- 
plifies instruction decoding and thus the 
hardware. Each field of the instruction 
word is dedicated to a particular function. 


For example, a fixed field is dedicated to 


the op code, and two or three fields are 
dedicated to operand specifiers. An added 
benefit is that an instruction with this 
format may allow some signals to be de- 
rived directly from it, permitting several 
operations to overlap. 


e Employ a load/store architecture. 
Memory references alone are done by 
load- or store-register operations. All the 
other operations are register-to-register. 
The simplicity of this addressing mode 
makes it easy to implement. The absence 


_ of complex addressing modes also makes 


it easier to restart instructions when an 
exception occurs. 
e Support high-level languages. 

The simple instruction set supplies the 
compiler with only the most primitive op- 
erations. From these the compiler can 
compose instruction sequences that are 
tailored to the exact requirements of the 
programming language. In some archi- 
tectures, the hardware savings realized 
by the simple implementation is invested 
in speeding up some of the high-level 
language's more time-consuming opera- 
tions. The University of California at 
Berkeley RISC processor, for instance, 
includes a large register file for speeding 
up the sequence of calling and returning 
from a procedure. 
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the main memory (see Fig. 4b). The 
Am29337 bound-checking facility 
detects any memory reference to this 
section and reports it to the CPU. 
The CPU can then redirect the ref- 
erence to the proper data store in the 
register file. 


Performance evaluation 


Usually it is hard to compare one 
architecture to another with any ac- 
curacy. The AMD RISC, though, is 
functionally compatible with Berke- 
ley’s RISC I, so that published pa- 
rameters can serve as a basis for pre- 
dicting their relative performance. 
The comparison is also predicated 
upon the following four assump- 
tions: 

e A 100-ns cycle time. The Am29332 
and Am29334 will contribute 80 ns 
to the total cycle time, and the regis- 
ter address generator and source 
multiplexer add another 20 ns (pro- 
vided Schottky TTL components 
form the glue logic of the circuit). 

e A 100-ns instruction cache. It has 
been established that an 8-Kbyte di- 
rectly mapped instruction cache can 
provide a hit ratio of 99.85¢ on VAX- 
11 (programs written in C and run- 
ning under Unix). High-speed 
RAMs (around 45 ns) are available 
from which a 100-ns instruction- 
cache memory with a good hit ratio 
can be easily constructed. 

e The execution of the same instruc- 
tions as RISC I. Register renaming 
of the code is easy. 

e No adverse impact on performance 
due to the AMD’s RISC having one 
fewer register window (Berkeley’s 
RISC I has eight register windows 
versus seven for AMD). 

For a simulated RISC I running 
11 benchmark programs written in 
C, the system cycle time was 400 ns. 
For the AMD system running the 
same programs, it was 100 ns, or four 


- times shorter. Further, as the table 


indicates, the AMD implementation 
averages about eight times faster 
than a typical 32-bit superminicom- 
puter. O 
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FAULT-TOLERANT CHIPS 


INCREASE SYSTEM 


RELIABILITY 





Using parity checking and a master/slave duplication 
technique, a bipolar chip set provides an interlocking 
fault-detection scheme that enhances fault tolerance. 


by Tim Olson 


Fault-tolerant computers have been used in satellites, 
aircraft, and industrial control and communications 
applications. The use of fault-tolerant techniques is 
currently being extended into other arenas, includ- 
ing on-line transaction processing and increasingly 
complex very large-scale integration circuitry. In 
addition, the rising cost of system maintenance and 
repair is causing a demand for fault-tolerant system 
building blocks that enhance system availability and 
reliability. 

The Advanced Micro Devices 32-bit, micropro- 
grammable chip set addresses these needs. The 
Am29300 family, which consists of the Am29332 
arithmetic logic unit (ALU), Am29331 sequencer, 
Am29334 register file, Am29325 floating-point pro- 
cessor and Am29323 multiplier, uses an interlock- 
ing fault-detection scheme to provide fault tolerance. 
This detection scheme consists of a parity-check sys- 
tem and a master/slave duplication technique. 


Add a bit 

Parity-check codes are a form of error detection 
in which a single parity bit is appended to a group 
of data bits. The addition of this single bit changes 
the number of zeros and ones within the bit group. 
If, with the addition of the parity bit, the group has 


an even number of ones, the group has even parity; | 





Tim Olson is a product engineer for Advanced Micro 
Devices (Sunnyvale, CA). He holds an MS in electri- 
cal engineering from the University of Arizona. 
Order # 08087A 

Reprinted with permission from Computer Design. 





if it has an odd number of ones, the group has odd 
parity. Parity-check codes can detect all single-bit 
errors, as well as errors that involve an odd num- 
ber of bits. For groups with an odd number of bits, 
even parity can detect the all-ones condition and odd 
parity, the all-zeros condition. 

To detect data-transmission errors, the Am29300 
family checks parity according to bytes. In this 
scheme, a parity bit is appended to each byte in the 
32-bit word, resulting in four 9-bit groups. Each 
group contains a single parity bit. There are three 
reasons for using byte parity: fault coverage, de- 
creased cycle time and byte-write capability. Fault 
coverage is increased by providing a single parity bit 
per byte. This technique catches many faults that 
would go undetected if a single parity bit per word 
were used. | 

Decreased cycle time refers to the fact that four 
parity bytes operating in parallel can generate and 
perform a parity check faster than a single 32-bit 
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(b) 


| Advanced Micro Devices’ Am29300 32-bit, bipolar, 
microprogrammable chip set consists of five devices 
_ that support fault-tolerant designs by providing parity 
_ checking/generation (a) and master/slave duplication 
(b) as fault-detection techniques. Parity checking pro- 
vides fault coverage for data storage and interchip 
|. connections. More elaborate coverage is provided by 
~. master/slave checking. With this technique, two iden- 
tical copies of a device are used in parallel, with one 
designated as master and the other as slave. For in- 
creased reliability, even the checking scheme can be 
checked. : 





parity-generation system. Byte-write capability pro- 
vides other advantages. In byte parity, individual 
bytes can be written back into the register file with- 
out reading the rest of the 32-bit word to compute 
parity. 
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The Am29300 family uses even parity, which ex- 
tends fault coverage to include a floating input bus. 
This parity scheme includes an all-ones failure mode, 
which occurs if a failure in the source device pre- 
vents it from driving the bus or if a failure in the 
control path prevents the source device from being 
accessed. Parity bits are stored in the register file, 
checked when input to the ALU and multiplier, and 
then generated as an output. If a parity error is de- 
tected on either of the two input buses, the Parity- 
Error output is asserted. This output is active high 
to provide fault detection for the error signals. 

This parity scheme provides fault detection on 
both the data storage and the interchip connections. 
Since the Am29332 ALU and the Am29323 multi- 
plier perform operations on data that cannot carry 
parity bits, however, a more elaborate checking 
scheme is used. This system is called master/slave 
checking. 


More than one copy 

Master/slave checking uses duplication as a fault- 
detection technique. Two identical copies of a de- 
vice are used in parallel; one is designated as mas- 
ter, the other as slave. The master device computes 
a result from the inputs and moves its result to the 
chip outputs. The slave device also computes a re- 
sult from the inputs, but all of its outputs (except 
for MS-Error) are changed to inputs that carry the 
results of the master. 

The slave compares its result with the result of the 
master and signals any discrepancy on the MS-Error 
output. This output, like Parity-Error, is active high 
to provide fault detection for the error signal. Mas- 
ter/slave checking can detect multiple failures in both 
the master and the slave devices, as long as at least 
one failure is nonoverlapping. This checking system | 
also detects output bus contention, which is indicated 
by the MS-Error output on the master device. This 
output is activated when the master result and the 
output bus fail to match. 

For systems that must operate nonstop, master/ 
slave techniques may also be applied at the board 
level. Two sets of master/slave pairs are used; one 
is active and the other is standby. If the slave of the 
active pair signals an MS-Error, the active pair is 
turned off and the standby pair is activated. The 
standby pair may also perform transactions while 
the active pair is running, resulting in twice the 
throughput of normal operation. |. 

The ALU, multiplier and sequencer all have a 
master/slave operation mode. This mode, combined 
with parity checking of the data paths, provides com- 
plete interlocking fault detection on a cycle-by- 
cycle basis. 





The fault recovery process can identify two types 
of faults: permanent or transient. Permanent, or 
hard faults, are caused by physical changes in the 
hardware (failures), while transient, or soft faults, 
are due to unstable hardware or temporary environ- 
mental conditions. Detection of a permanent fault 
may cause a standby unit to take over for the failed 
device. 

When transient faults are detected, on the other 
hand, the microinstruction that faulted will be 
restarted after the transient condition disappears. In 
either case, the faulted microinstruction must be 
aborted, so that no state change occurs to disrupt 
the restarting of the microinstruction. 

To restart the microinstruction, the sequencer per- 
forms traps at any microinstruction boundary. When 
a trap condition is signaled by the simultaneous as- 
sertion of the interrupt request and force continue 
signals with the Carry input (C;,) signal disabled, 
the address incrementer to pass the current address 
instead of the next address, the sequencer puts the 
Y output bus in a high-impedance state. This allows 
an external trap vector to be placed on it. The se- 
quencer then pushes the trapped microinstruction 


TRAP OCCURS 


EXECUTING AT A 
WHEN TRAP 
OCCURS 


. EXTERNAL DEVICE 
B=-£>7- DRIVES TRAP VECTOR 
: ADDRESS “B” 
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address onto the internal stack and starts fetching 
microinstructions, using the trap vector as the start- 
ing address. The aborted microinstruction is stored 
on top of the stack and is restarted by executing a 
return instruction. When the Hold input is assert- 
ed, updates of the ALU’s internal state are inhibit- 
ed. This ensures that the aborted microinstruction 
has no effect. 


Fault-tolerant CPU design 

In order to show how the Am29300 family mem- 
bers interact to perform fault detection, recovery and 
isolation, consider a simple CPU design. In this de- 
sign, the data path consists of two sets of register 
files and two ALUs in a master/slave configuration. 
Because new data may already have been written to 
the register file before a fault is signaled, two reg- 
ister file sets are required. One register file set holds 
the working address and data registers, while the 
other set holds backup copies of these registers that 
are used in error recovery. 

The ALUs perform address and data calculations, 
which are used to address memory via the data-out, 
data-in and address registers. These registers are built 


TRAP ROUTINE STARTS 


EXECUTING AT B 
TRAPPED ADDRESS ON STACK 


Errors are signaled with Parity-Error and MS-Error outputs and prioritized by an interrupt controller. The se- 
quencers then trap the microinstruction that.is being executed. To restart a microinstruction when a failure oc- 
curs, the sequencer can perform traps at any microinstruction boundary. When a trap condition is signaled, the 
sequencer changes the Y output bus to a high-impedance state, allowing an external trap to be placed on it. 
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from Am29818 diagnostics registers that offer off- 
line testing and fault diagnosis. The control path 
- §tarts with the instruction register, which consists 
of four serial shadow registers. The instruction is 
applied to a mapping PROM to derive the starting 
microcode address for the sequencer, which is built 
from two Am29331 sequencers in a master/slave con- 
figuration. The microinstruction is fetched from the 
writable control store and loaded into pipeline reg- 
isters, which distribute control throughout the CPU. 


Fault detection, recovery and isolation 

During instruction execution, errors are detected 
on acycle-by-cycle basis by the sequencer and ALU 
master/slave pairs. They are signaled with the Parity- 
Error and MS-Error outputs. These error signals are 
prioritized by a vectored priority-interrupt controller, 
which causes the sequencers to trap the microinstruc- 
tion that is currently executing. The trap vector is 
then put on the Y output bus. The controller also 
asserts the Hold pin on the ALUs, which prevents 
the trapped microinstruction from updating the in- 
ternal state of the ALU and disables writes to the 





backup register file. Writes to the backup register 
file are disabled, keeping the state of the ALU prior 
to the trapped microinstruction intact. 
Microinstruction processing then begins with the 
trap routine associated with the highest priority fault 
indication. This routine can determine whether the 
fault is transient or permanent. If the fault is tran- 
sient, the trapped microinstruction must be restarted. 
The trap handler first restores the state of the reg- 
ister file by copying each of the registers in the back- 
up register file into the working register file, restoring 
the registers to the values they held prior to the fault. 
Any other state that was saved during trap process- 
ing is also restored during this process. The se- 
quencers then perform a return instruction, popping 
the trapped microinstruction address from the stack. 
To increase system availability, permanent faults 
must be isolated quickly. This usually involves run- 
ning a series of test patterns through the devices to 
determine which ones have failed. These patterns can 
be loaded and tested quickly using the serial shadow 
registers. All of the serial shadow registers in the 
CPU design are connected by a serial link that forms 





a diagnostics loop. Arbitrary patterns can be loaded 
serially through the loop, then clocked through in 
a single system cycle. The resulting state can be read 
out from the loop for use in isolating the failed 
device. 


Checking the checkers 

Failures in checking devices are even more serious. 
A failed checker can give a false indication of error 
or a no-error condition. While false indications of 
failure are tolerable, a no-error condition often re- 
sults in undetected faults. 

There are three basic fault detectors in the CPU 
design: the Am29332 parity checker, the Am29332 
master/slave checker and the Am29331 master/slave 
checker. These fault-detection circuits must be veri- 
fied during system initialization, and their opera- 
tional status should be confirmed periodically during 
subsequent operation. 

Fault injection, which is the process of deliber- 
ately causing a fault in the part of the system that 
is checked by the fault-detection hardware, can be 
used to perform this verification. The parity-check 
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circuitry can be tested by loading a word with bad 
parity into the data-in register via the serial link. It 
is then loaded into the register file and used in an, 
ALU operation. This procedure should detect a par- 
ity error. 

Another method of verifying the parity checker 
is to issue a microcode instruction that performs an 
ALU operation while the register-file outputs are in 
a high-impedance state. The parity checker should 
detect the all-ones condition and flag the error. 

Master/slave checking can be verified on the ALU 
by using the Hold input. The status registers in the 
master and slave are first set to a known equivalent 
state. The next microinstruction alters that state, but 
asserts the Hold input on one of the devices, inhibit- 
ing the status update. A master/slave error, caused 
by the differing status outputs, should occur. Mas- 
ter/slave checking can also be verified on the 
Am29331 sequencers by executing a jump instruc- 
tion while asserting the force-continue input on one 
of the parts. The part without the asserted force- 
continue input executes the jump, causing a nonse- | 
quential address for the next microinstruction. The 
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In this fault-tolerant CPU sys- 


tem, designed around 32-bit 


bipolar building blocks, the 
arithmetic logic unit, multiplier 
and sequencer have a master/ | 
slave operation mode. This. | 
mode, combined with parity Co 
checking of the data paths, pro- — 
vides interlocking fault detection © 
on a cycle-by-cycle banis. 


force continue asserted on the other sequencer over- 
rides the jump instruction, causing the next microin- 
struction address to be sequential. This results in 
differing addresses which, in turn, causes a master/ 


slave error. 


The AMD family extends many of the concepts 
of fault-tolerant computing, including parity check- 
ing and master/slave duplication into the 32-bit are- 
na. This fault-detection scheme can identify both 
permanent and transient faults, ensuring broad- 
based fault protection throughout the system. 6D 
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Designer’s Guide to: 
Floating-point processing—Part 1] 


Hoating-point math 
handles iterative and 
recursive algorithms 





Floating-point arithmetic gives you better dynamic 
range and precision than integer arithmetic, but it 
needs careful implementation. Part 1 of this 3-part 
series discusses possible sources of error you may 
encounter when using floating-point hardware, and it 
reviews the current standards. Part 2 will describe 
the advantages of fast array processors, and part 3 
will discuss algorithmic options for floating-point 
processors and considerations when implementing a 
complete system. 


Charlie Ashton, Advanced Micro Devices Inc 


Many signal-processing algorithms, such as fast Fou- 
rier transforms, generate outputs whose magnitudes 
far exceed those of the inputs. Nevertheless, those 
outputs must retain the precision of the input operands 
if the accuracy of the computation is not to be so 
severely degraded as to render the results meaning- 
less. For these and similar applications that use itera- 
tive or recursive algorithms, true floating-point opera- 


tion often furnishes the only acceptable number 


representation. 
Until recently, you needed a very good reason to give 


your system floating-point hardware. It was large, 
expensive, power-hungry, and relatively slow (al- 
though faster than the software-based implementations 
needed to perform comparable operations). However, 
the introduction of fast VLSI array processors has 
changed the picture. These devices (such as Weitek’s 
1032/1033 and AMD’s Am29325) can stand alone and are 
implemented on one or two chips. You can now economi- 
cally use floating-point hardware in applications whose 
size and budget constraints would previously have 
forced the use of fixed-point hardware or floating-point 
software. 

The new chips won’t dissipate all your potential 
headaches, of course. Just one of the many choices you'll 
have to make is which standard to support. The four 
most commonly used standards (IEEE, DEC, IBM, 
and MIL-STD-1750A) have subtly different binary rep- 
resentations of floating-point numbers. Each standard 
has advantages and disadvantages for specific types of 
computational problems. This series of articles covers 
some of the theoretical considerations you'll have to 
take into account, as well as some specifics on the 
available chips. 

The manner in which a system represents floating- 
point numbers clearly affects both the dynamic range 
and the precision of the system. The most obvious way 
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VLSI processors now make floating-point 
hardware cost effective in applications with 
severe budget or size constraints. 


to represent numbers is to use a signed exponent and a 
signed fraction (Table 1). A large exponent field obvi- 
ously supports a large dynamic range: A 2-digit expo- 
nent, for example, implies a dynamic range of 10!, 
whereas a 3-digit exponent increases the dynamic 
range to 10’. Similarly, the more digits you can 
include in the fraction, the greater will be the precision 
of the number, especially if the number is normalized so 
that the left-most digit of the fraction is nonzero. 
Leading zeros in the fraction of an unnormalized num- 
ber clearly reduce the precision of that number. As a 
general principle, then, the precision of a floating-point 


TABLE 1—SIGNED vs BIASED EXPONENTS 


DECIMAL SIGNED 
NUMBER EXPONENT FRACTION 


~123.45 10*8 0.12345 
+0.0000678 10-4 0.678 


DECIMAL BIASED 
NUMBER EXPONENT FRACTION 


~123.45 5+3=8 0.12345 
+0.0000678 5-4=1 0.678 


number depends on the length of its fraction, and the 
dynamic range depends on the size of the exponent and 
the radix. 

In practice, floating-point hardware generally uses a 
biased exponent for two reasons. First, use of a biased 
exponent avoids problems that follow from the need to 
handle negative numbers in the exponent circuitry. 
Second (and perhaps more important), a suitable choice 
of bias can ensure that you'll be able to compute the 
reciprocals of all the representable numbers without 
exponential overflow or underflow. You'll find that 
overflow and underflow cause plenty of problems in 
computing the fraction portion of the output (see box, 
“Dealing with underflow and overflow”). You certainly 
don’t want to introduce them into exponential computa- 
tions as well. 

Biased exponents and normalized fractions are the 
features that give true floating-point representation a 
clear advantage over block floating-point and integer 
formats. To double the dynamic range of an integer 
word, you have to double the number of bits in it. To 
obtain the same result in true floating-point operation, 
you need to add only one bit to the exponential field. In 
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fact, a 32-bit floating-point number in IEEE format has 
a dynamic range equivalent to that of a 276-bit 2’s- 
complement integer. 

Despite the high precision and large dynamic range 
of normalized floating-point numbers, floating-point 
systems do not altogether escape the effect of quantiza- 
tion (rounding) errors. You can think of a floating-point 
system as producing an infinitely precise result (ie, a 
fraction of unlimited length, abbreviated “IPR”), which 
is then rounded to fit into the destination format. 
Typically, this strategy means that some of the low- 
order fraction bits are lost. Consequently, whenever 
the destination format lacks enough bits to accommo- 
date the IPR, rounding introduces quantization errors, 
which in turn result in system noise. Consider, for 
example, the multiplication of two numbers in a 4-digit 
decimal system: 


(0.8102 x 10*) x (0.8001 x 10-7)=0.6410401 x 10°. 


The IPR is rounded to 0.6410 x 10~‘ to fit the destina- 
tion format, thus introducing a quantization error. In 
practice, quantization errors during a long computation 
will be random, and the overall effect will be analogous 
to an increase in system white noise. If the quantization 
errors are not random, they may appear as system 
nonlinearities and, as a consequence, cause serious 
problems in such applications as spectral analysis. 


Are quantization errors data dependent? 


Mathematical analysis of an integer system shows 
that quantization errors due to rounding have a mean 
value of one-quarter the value of the least significant 
bit. The relative error at each rounding thus depends 
on the magnitude of the operand being rounded. There- 
fore, as the magnitude of the operand decreases, the 
relative quantization error increases. The same is true 
of a block floating-point system, in which denormalized 
operands may contain leading zeros. In integer and 
block-floating-point systems, therefore, the errors are 
data-dependent, and for this reason error analysis is 
both difficult and time-consuming. 

In true floating-point systems, however, operands 
are generally normalized, so the relative quantization 
errors are the same, regardless of the magnitude of the 
operands. Quantization error analysis in floating-point 
systems is thus data independent and therefore doesn’t 
require complicated worst-case simulations. 

Floating-point systems can suffer from a computa- 
tional drawback known as the “operand ordering prob- 
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lem.” Consider the addition of three floating-point 


- numbers: A (=1), B (=2™), and C (=—2). You may find 


that (A+B)+C=0, although A+(B+C)=1. This result 
clearly violates the associative law of addition. The 
discrepancy occurs because the floating-point standard 
doesn’t have enough bits to accommodate the interme- 
diate result of the first calculation (A+B). The hard- 
ware has to round the IPR, 2%+1, to the nearest 
representable number, which is 2”. Errors of this kind 
are inevitable whenever the IPR has to be rounded to 
fit the destination format, although they would usually 
be considered so small as to be unimportant. 


a a a Ta aaa aaa aa a TIA UTE RO TE A EE SPO EA: IP UE A RS 


You can minimize rounding errors (although, as the 
previous example shows, you can’t entirely remove 
them) by a judicious choice of rounding mode. Some 
floating-point standards allow you to select from among 
several rounding modes the one that best suits your 
operation. All of the commonly used floating-point 
standards support one or more of four modes: 

@ Round-to-nearest mode replaces the IPR with the 
closest representation that fits in the destination 
format. In the case of an IPR that falls exactly 
halfway between two representations, the IEEE 
standard rounds the IPR to the representation 





Dealing with underflow and overflow 
| For the rare cases in which the 


result of a calculation is too 
large or too small to be repre- 


| sented, you must have previous- 


ly specified the way in which 
your system will deal with that 
result. In short, your system 
must handle the related prob- 
lems of underflow and overflow. 
Underflow arises when the 


| rounded result of an operation is 


a number between zero and the 
smallest representable norma- 
lized number. You can handle 
such a number in one of two 
ways: You can set the number to 
zero (sudden underflow), or you 
can represent the rounded result 
by a denormalized number 
(gradual underflow). 

Overflow occurs when the 
rounded result of an operation is 
greater than the largest repre- 
sentable number. You can handle 
this problem by setting the re- 


| sult to infinity, which implicitly 


terminates a chain of calcula- 
tions, or by saturating the result 
to the largest representable 
number (correctly signed). 

It’s important to know which 
of the various methods your sys- 
tem supports, because in some 
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applications sudden underflow or 
saturated overflow can destroy 
the accuracy of an entire series 
of calculations. The IEEE stan- 
dard, for example, treats under- 
flows by invoking the gradual — 
underflow method, while the 
IBM and DEC standards deal 
with only sudden underflow. 
Sudden underflow is generally 
the fastest method of treating 
underflows and is acceptable in 


the majority of systems because > 


high accuracy is seldom required 
for very small numbers. Sudden 
underflow can produce quantiza- 
tion errors almost as large as 
the smallest normalized number, 
but usually you can treat these 
errors as insignificant. 

The gradual-underflow method 
creates much smaller errors be- 
cause it rounds results to a nor- 
malized number. On the other 
hand, gradual underflow is more 
difficult and more expensive to 
implement than sudden under- 
flow, a drawback you'll have to 
weigh against the advantage of 
accurate results over a wider 
range of numbers. Gradual 
underflow is generally best for 
iterative applications in which 


you drive a residual value to 
zero and for which you require 
maximum possible accuracy. 
When such a residual value 
underflows gradually to zero, 
you know that it’s negligible 
compared with every normalized 
number. 

For handling overflow, data- 
processing applications generally 
set the result to infinity, because 
in a high-accuracy mathematical 
model a saturated result could 
destroy the accuracy of an entire 
series of calculations. In real- 
time digital signal processing, 
however, it’s generally prefera- 
ble to saturate the result and 
continue the chain of calcula- 
tions. In the analysis of radar 
returns, for example, you would 
certainly not want a single 
anomalous return to bring the 
entire processing sequence to a 
halt by introducing an operand 
(an infinity) that would be use- 
less in further processing. In 
this and similar applications, it’s 
often better to have an approxi- 
mately correct data point than 
no data point at all. 
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TABLE 2—NUMBER REPRESENTATION 
IN FOUR FLOATING-POINT STANDARDS 


IEEE FORMAT 
BIT 31 30 29 28 27 26 25 24 23 22 = 20 19 3 2 1 0 
S OF Bee foe Oe Oe! NBR BN ee Q-1 2-2 2-3 p-4# p20 a-M 9-22 9-23 
- SIGN | BIASED EXPONENT | FRACTION | 
S (E) (F) 
E=OANDF=O0....... _V = (-1)§ * 0(-0, +0) 
E=OANDF#0O......... V = (-1)8 * OF * 2-126 (DENORMALIZED) 
OS. BS 265 sheaves V = (-1)8 * 1.F * 2&-127 (NORMALIZED) 
E = 255ANDF=0....... V = (-1)§ * 00 (-00, +00) 
E = 255ANDF#0....... V = NaN (NOT-A-NUMBER) 


(a) 


DEC FORMAT 
BIT 
ea SIGN a a BIASED oe ius si = 
=1ANDE=0......... V = DEC RESERVED OPERAND 
: = OANDE = ; once ce v=0 
BS Ou wecar eu asctans V = (-1)§ * O.1F * 2E-128 (NORMALIZED) 
(b) 


IBM FORMAT 


3 2 1 0 


2-21 2-22 2-23 9-24 


|. SIGN | BIASED EXPONENT | FRACTION | 
S (E) (F) 

F 

F 








O82 Leite a oetihimme eds V = (-1)§ * 0(-0, +0) 

BIO va ey eset alee wae V = (-1)5* OF * 16F-84 
(c) 

MIL-STD-1750A FORMAT 
BIT 31 30 29 28 27 . 11 10 9 8 7 6 5 4 3 2 1 0 
; — 27 -20 Q-21 9-22 9-23 97 96 95 94 93 92 91 90 
| FRACTION | EXPONENT | 
| (F) (E) 


Ae ee ke sepa ea V=F*2E 
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having an LSB of zero, whereas the DEC stan- 
dard rounds the IPR to the representation that 
has the greater magnitude. 
Round-to-minus-infinity mode rounds the IPR to 
the closest representable value that is less than or 
equal to the IPR. 
Round-to-plus-infinity mode rounds the IPR to 
the closest representable value that is greater 
than or equal to the IPR. 
Round-to-zero mode is analogous to truncation; it 
rounds the IPR to the closest representable value 
with a magnitude less than or equal to that of the 
IPR. 

As noted earlier, the various floating-point standards 
specify different binary representations of floating- 
point numbers, and you'll have to match their respec- 


_ tive advantages and disadvantages to your own compu- 





tational problems. The four of the most common binary 
floating-point standards, the IKEE, DEC, IBM, and 
MIL-STD-1750A standards, all represent single-preci- 
sion, floating-point numbers by means of 32-bit words 


_having the formats shown in Table 2. All four standards 


support double-precision data, and some of these stan- 
dards also support other data types, such as single- 
extended and double-extended data. 

The IEEE working group presented the specifica- 
tions contained in proposed standard P754, draft 10.1, 
as a robust standard for portable floating-point soft- 
ware. This proposed standard has received wide ac- 
ceptance, and it’s likely to form the basis of a large 
number of future hardware implementations. P754 has 





Biased exponents and normalized fractions 
give true floating-potnt systems a clear ad- 
vantage over integer and block-floating- 
point systems. 


several features that aren’t found in other standards. In 
particular, +0, —0, and infinities are all valid operands. 
Operations performed on infinities signal no exceptions 
unless the operation itself is invalid. The standard 
allows the use of a special operand known as NaN 
(Not-a-Number). An implementation should interpret 
NaNs as signals rather than numbers, and it should use 
NaNs to indicate invalid operations or to. pass status 
information through a series of calculations. Also, the 
standard accepts denormalized numbers as a represen- 
tation of a result that is less than the smallest norma- 
lized number. 

The DEC standard is implemented in all DEC VAX 
minicomputers; the VAX Architecture Manual contains 
the full specifications of the standard. Conceptually 
simpler than the IEEE standard, the DEC standard 
has no provisions for infinities or denormalized num- 
bers, and it has only a single representation for zero. 
The DEC standard does, however, incorporate DEC 
reserved operands, which are analogous to IEEE 
NaNs. 

An important featureie common to both the IEEE and 
the DEC standards is the existence of a hidden bit. 
Both standards specify that all operands will be norma- 
lized (except for denormalized numbers in the IEEE 
format). This stricture implies that the leading fraction 
bit must always be a one. This bit would not only be 
redundant if included in the 32-bit representation, but 
it would actually reduce the precision of the number, so 
its presence is assumed. In the case of IEEE denor- 
malized numbers, the biased exponent is zero, thereby 

_continued, page 6-108 


TABLE 3—COMPARISON OF FLOATING-POINT STANDARDS 


IEEE IBM _1750A 
SMALLEST | 
POSITIVE 9-129 
NUMBER | 
SMALLEST | 

NEGATIVE 27149 2 ~ 280 aoe. 
NUMBER 

DYNAMIC pee 
RANGE | 
PRECISION ae ort 3-23 
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VLSI floating-point »P for recursive algorithms 


One example of floating-point 
hardware that handles recursive 
algorithms is the Am29325 from 
Advanced Micro Devices. The 
processor integrates a 32-bit 
adder/subtracter, a multiplier, 
and a data path on a single chip. 
This level of integration reduces 
the processing overhead in- 
curred by chip sets comprising 
separate ALU and multiplier 
chips. The internal feedback 
paths facilitate the implementa- 
tion of such recursive algorithms 
as sum-of-products and Newton- 
Raphson division. 

The processor supports both 
the IEEE and DEC floating- 
point formats. The instruction 
set includes instructions that 
convert data from IEEE format 
to DEC format and vice versa, 
as well as instructions that con- 
vert data to and froin 32-bit in- 
teger format. 


Three functional blocks 


The processor has three main 
functional blocks (Fig A): a’ 
floating-point ALU, a status-flag 
generator, and a 32-bit internal 
data path. The ALU is fully 
combinatorial, and it performs 
all instructions in a single cycle. 
The eight instructions handle 
floating-point R+S, R-—S, RxS, 
and 2—S operations as well as 
the format conversions. 

The 2—S instruction forms the 
core of the Newton-Raphson di- 
vision algorithm, which performs 
division by a sequence of itera- 
tions. In this and other iterative 
algorithms, intermediate results 
are retained in the R or S regis- 
ter, thereby eliminating the 
need for any off-chip registers 
and minimizing the number of 
required data transfers. 

Three programmable I/O 
modes allow the Am29325 to in- 
terface with a variety of sys- 
tems. The 32-bit, 2-input-bus 
mode uses three separate 32-bit 
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GENERATOR 
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Fig A—This VLSI floating-point processor is fast because it contains all the major 
components for 32-bit operations on a single chip. It has one input for an external clock 
and 17 inputs for instruction-select and control functions. 


buses (R, S, and F) for high- 
speed, nonmultiplexed operation; 
in this case, the R and S regis- 
ters are configured as indepen- 
dent 32-bit ports. In the 32-bit, 
l-input-bus mode, both the R 
and 8S registers are connected to 
a common 82-bit input bus; the 
host multiplexes operands onto 
this bus. In the 16-bit, 2-input- 
bus mode, 32-bit operands are 
multiplexed onto the correspond- 
ing 16-bit buses (low-order bits 
first). 


Six flags and four modes 

The status-flag generator pro- 
vides six fully decoded flags. 
Four of these flags report excep- 
tional conditions, as defined in 


the IEEE standard. The remain- 
ing two flags identify zero-val- 
ued or nonnumerical results. 

The Am29325 implements the 
four IEEE-mandated rounding 
modes: round-to-nearest, round- 
to-plus-infinity, round-to-minus- 
infinity, and round-to-zero. The 
same four modes are supported 
for the DEC standard, except 
that when the infinitely precise 
result is halfway between two 
representable numbers, the 
IEEE round-to-nearest mode 
rounds to the closest representa- 
tion with an LSB of zero, 
whereas the DEC round-to-near- 
est mode rounds to the value 
with the larger magnitude. 
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instructing the system to assume that the value of the 
hidden bit is also zero. 

The IBM floating-point standard differs from its 
IEEE and DEC counterparts in several respects. It has 
no provision for infinities or reserved operands, al- 
though it does accept denormalized numbers. More 
important, however, are the absence of a hidden bit and 
the use of radix 16 rather than radix 2. Because the 
exponent of an IBM number is expressed as a power of 
16, the standard has a large dynamic range. For the 
same reason, however, numbers are spaced farther 
apart than in the other formats. This increased gran- 
ularity results in less precision than is provided by the 
IEEE and DEC formats. Also, the use of radix 16 
allows as many as three leading zeros in the binary 


fraction of a normalized number, even though the 


leading hexadecimal digit is nonzero if the number is 
expressed in hexadecimal format. The leading binary 
zeros can cause the precision to vary from one operand 
to another. This variation is known as wobbling. 


The MIL-STD-1750A standard, developed for use in 


military systems, allows no reserved operands, infini- 
ties, or denormalized numbers. Furthermore, the use 
of a 2’s-complement fraction, rather than a sign-magni- 
tude representation as in the other three formats, 
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requires a somewhat different hardware architecture. 

The applications to which each of the four standards 
is best suited differ quite widely. Nevertheless, you can 
make a simple comparison (Table 3) between the 
standards, based on factors such as the largest and 
smallest representable numbers, the dynamic range, 
and the precision. Such a comparison can be useful in 
selecting the most suitable format for a given applica- 
tion. In most cases, however, the format to be used is 
determined by outside constraints, such as compatibili- 
ty with existing hardware or software. EDN 
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Powerful math-processing chips configured with high- 
speed memories and controllers form the core of a 
floating-point math or array processor for small 
computers. This second part of EDN’s 3-part float- 
ing-point math series discusses the tradeoffs you 
must make to add flexibility and speed to array- 
processor designs. 


Robert M Perlman, Advanced Micro Devices 


For such jobs as digital-signal processing, image pro- 
cessing, graphics, and scientific calculations, an array 
processor can take over repetitive arithmetic chores 
while your host computer performs control tasks and 
retrieves information. By employing a floating-point 
array processor, you also increase the math-processing 
power of your computer system. 

The basic array-processor design (Fig 1) contains an 
arithmetic unit, a controller, data memory, program 
memory, and a host interface (see box, “Array pro- 
cessor vs general-purpose computer”). If you use newer 
control, memory, and math chips, you can fit the circuit 
on a single pe board. This array-processor design uses 
an Am29825 floating-point processor chip, which oper- 


ates with either IEEE- or DEC-standard single-preci- 
sion data. The chip performs single-cycle floating-point 
additions, subtractions, multiplications, and format 
conversions at an 8-MHz clock frequency. 

Because the Am29825 chip contains a floating-point 
arithmetic unit (AU), three 32-bit registers, two data 
buses, and two data-selection multiplexers, you need 
only a small amount of external hardware to design a 
complete math- or array-processor circuit. In the 
array-processor design, the Am29325 receives oper- 
ands from two high-speed memories. An 8kx382-bit 
RAM provides input data for your algorithms, and it 
stores intermediate and final results. An 8kx32-bit 
PROM provides constant values for the algorithms. 

Although you can design a circuit that specifically 


- controls the math chip and its associated memory chips, 


you'll find an equivalent circuit in the 2910A micropro- 
grammable controller chip. The 2910A chip is a general- 
purpose controller; it’s not dedicated to controlling the 
Am29325. The controller chip contains a program 
counter, a loop counter, a LIFO stack, and other 
circuits that access program instructions and control 
the array processor in the basic design. The controller 
provides an 11-bit address for the design’s 2k x64-bit 
microprogram memory, which contains the instructions 
for your algorithms. Each algorithm instruction con- 


Reprinted with permission from EDN, January 23, 1986. Copyright 1986, Reed 
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A. basic array processor speeds math opera- 


tions by performing repetitive tasks quickly. 


tains 64 bits that the circuit divides into seven groups of addresses the microprogram memory for the next 


outputs: 
@ 11 jump address bits 
one address and write-enable multiplexer bit 
one write-enable control bit 
13 RAM-address bits 
18 PROM-address bits 
24 miscellaneous control bits 
one interrupt-control line. 

The microprogram memory routes its outputs 
through an internal register and then to the rest of the 
array-processing hardware. Although it may not be 
obvious, the register at the microprogram memory’s 
output helps maintain high-speed data processing. By 
using a clocked register to hold the memoyy’s output 

bits, the controller latches a 64-bit instruction while it 
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Fig 1—The Am29325 floating-point processor used in this design adheres to IEEE and DEC floating-point standards. 
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instruction. The memory’s output register therefore 
permits the overlap of the instruction-fetch and -exe- 
cute operations, which saves processing time. 

Because it holds information for a pending operation, 
the microprogram memory’s output register is often 
referred to as a pipeline register. Array processors can 
contain a series of pipeline registers, the number of 
which depends on the architecture of the array pro- 
cessor and the maximum processing speed you need. 


Host interface links processors 

You must carefully choose your host-computer inter- 
face circuits according to the type of system bus in your 
computer. You can accommodate most general-purpose 
computers by providing bus buffers for the address, 
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| TABLE 1— 
BENCHMARK EXECUTION TIMES 


EXECUTION TIME 
1.125 »SEC 
1.25 SEC 
1.0 »SEC 
14.0 nSEC 


OPERATION 


5-TAP FIR FILTER 

RADIX-2 FFT BUTTERFLY 

4x1 MATRIX ADDITION 

4x4 MATRIX MULTIPLICATION 


data, and control lines. You’ll also need a small amount 
of control logic to manage the flow of information to and 
from the array processor and the host computer. For 
example, you can construct a Multibus interface by 
using octal bus buffers and PAL chips. If your host 
computer’s data bus contains fewer than 32 data bits, 
you'll need to convert the data to and from the 32-bit 
format that the array processor requires. You can 
include double-buffer latch circuits for the data inputs 
to the array processor, and you can provide latches and 
multiplexers on the processor’s data-output lines. 

The host computer’s data bus provides the main link 
between the host and the array processor. Your com- 
puter starts a math operation by loading the RAM with 
raw data and then signaling the array processor to 
start a math-processing algorithm. After the processor 
runs an algorithm program, your host computer reads 
the RAM’s contents to obtain the results. 

To simplify the data-transfer operations to and from 
the host computer, the array processor goes into an 
idle, or standby, state when it isn’t running an algo- 
rithm program. Instead of controlling the processor’s 
data and control lines, the microprogram controller 
continuously runs a l-microinstruction program loop. 
In addition, the idle microinstruction switches the 
RAM’s address and write-enable multiplexers so that 
the RAM appears to be part of the host computer’s 
main memory. The host computer loads the desired 
input data into the data RAM, and it then loads the 
microprogram controller with the starting address of 
the algorithm you want to run. The microprogram 
controller then jumps to the preprogrammed sequence 
of microinstructions for the algorithm. The algorithm’s 
first microinstruction reconfigures the data RAM so 
that only the array processor can address it. When the 
algorithm completes its tasks, it sends an interrupt 
signal to the host processor, switches the data RAM 
back to the host, and executes the 1-instruction standby 
loop. 


Once you're sure the array processor is operating 
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properly, you can test the operating speed of your 
circuit by using benchmark programs tailored to specif- 


- ie tasks (Table 1). The benchmark times were calcu- 


lated for the array processor with an 8-MHz clock 
frequency. The basic processor performs one data- 
RAM operation (read or write) per clock cycle. 


Modifications improve performance 


Although the basic array-processor circuit works 
well, you can improve its performance. The ability to 
take data addresses directly from the program memory 
in the simple array processor means that the program 
memory must contain a section of microcode for each 
iteration of an algorithm. For example, a program that 
performs 20 matrix multiplications contains a separate 
section of microprogram code for each multiplication 


PROGRAM 
MEMORY <S 
ITERATION PROGRAM 
1 MEMORY 


PROGRAM | 
)oMEMORY 


ITERATION x 
ITERATION 

ITERATION rm ; 
3 g yee’ 
= GENERATOR S, REPEAT 


S NOTIMES 


DATA = 


ADDRESSES 
Pen heeee ITERATION 
N 
ar DATA 
ADDRESSES 


me a 


{a) 


Fig 2—You can implement the program memory in two ways: 
Either you can include steps for each iteration of your algorithm (a), 
or you can add an address-generator circuit (b) that lets you use only 
one section of code for all iterations. The address generator locates 
specific values and coefficients in memory automatically. 


step. Each code section contains specific addresses for 
data and coefficients (Fig 2a). The in-line coding ap- 
proach therefore wastes program-memory space. 

One improvement found in virtually every array 
processor is a data-address-generator circuit that gen- 
erates the necessary data and coefficient addresses 
within the array processor. The address-generator 
hardware reduces the amount of microprogram memo- 
ry you'll need for an algorithm. By using such hard- 
ware, the processor performs multiple iterations of an 
operation by looping through the same section of micro- 
code as many times as necessary (Fig 2b). 

Depending on your specific tasks, you can choose a 
data-address generator that fits 4 specific algorithm, 
such as the fast Fourier transform (FFT), or you can 
choose a general-purpose addressing device. Some 

continued, page 6-114 
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Array processor vs general-purpose computer 


To understand better what an 
array processor does, consider 
first the strengths and short- 
comings of general-purpose com- 
puters. General-purpose comput- 
ers incorporate the standard Von 
Neumann architecture and per- 
form a variety of tasks. Such 
computers perform instruction- 
fetch and instruction-execution 
tasks sequentially, with instruc- 
tions and data available in one 
memory array (Fig A). 

Consider the calculation of the 
sum of products, a common task 
in signal-processing and matrix- 
manipulation algorithms. The 
basic sum-of-products equation is 


N 
Y= > kX, 
i=] 


where k; and x; represent. coeffi- 
cients and data stored in memo- 
ry, respectively. The sum-of- 

products computation represents 


a large class of array-processing — 


problems that share three funda- 
mental characteristics: First, 
they involve repetitive computa- 
tions on arrays of data. Second, 
the underlying control structure 
is simple, having many loops but 
no conditional branches. Third, 
the math steps are memory-in- 
tensive—each calculation re- 
quires one data point and one 
constant from memory. 

To evaluate a product term, 
the computer fetches x; and k;, 
multiplies them, and then adds 
the result to the running total. 
Each step requires an instruc- 
tion-fetch cycle and an instruc- 


Fig A-—A general-purpose computer memory stores instructions and data in the same 






block. The computer must access instruction and data values sequentially, 


tion-execution ¢ycle. Although 
specific details vary from com- 
puter to computer, in general 
even primitive math operations 
require many cycles. 


Overlapping operation 

Traditionally, Von Neumann- 
type computers perform each 
step sequentially. Array pro- 
cessors, however, provide a de- 
gree of parallelism by doing 
more than one thing at a time. 
When data and program steps 
reside in separate memories——an 
arrangement that fits the Har- 
vard-architecture model—in- 
struction- and data-fetch opera- 
tions can overlap (Fig B). In the 
case of the sum-of-products op- 
eration, the array processor 
fetches the input operands at the 
same time that it fetches the in- 
struction that performs the mul- 
tiplication. Most array proces- 
sors also overlap instruction- 
fetch and instruction-execution . 
operations. 

For highly regular, math-in- 
tensive algorithms, the overlap- 
ping results in high-speed opera- 
tion, but such operation can be 
inefficient when the algorithm 
includes conditional branches. If, 


for example, a program calls for 
a conditional branch to another 
instruction, the instruction fol- 
lowing the branch instruction 
may be in the instruction queue. 
If it is in the queue, the comput- 
er discards it. Array processors 
are therefore best suited to the 
many number-crunching algo- 
rithms that require little or no 
conditional branching. 

Because array processors pro- 
vide parallel operation, you can 
optimize them for a specific 
math process. For example, an 
array processor designed for a 
sum-of-products operation may 
contain a multiplier and adder 
circuit, which evaluates a prod- 
uct term in one cycle. Because 
array processors perform paral- 
lel operations, programming the 
processors is more demanding 
than programming a general- 
purpose computer. However, the 
resulting increase in computa- 
tional power often justifies the 
additional programming effort. 
Instead of programming in Basic 
or in assembly language, you'll 
use a microcode that controls in- 
dividual circuits and operations 
in the array processor. Although 
such programming is demand- 


tN RANI TTT tt LC A tt I SP Ne ENTRAR nS PGAPERSTA ne, 


EDN January 23, 1986 


6-112 


ing, it gives you complete con- 
trol of the array processor’s in- 
ternal operations. 


Five functional blocks 


Array processors typically re- 
ceive data and instructions from 
a host machine—usually a 
general-purpose computer. Al- 
though specific array-processor 
architectures vary greatly, most 
processors contain at least five 
functional blocks: an arithmetic 
unit, data memory, a controller, 
program memory, and a host 
interface. 

The heart of the processor is 
the arithmetic unit, which con- 
trols the data paths and per- 
forms arithmetic operations. De- 
pending on your application, the 
arithmetic unit performs fixed- 
point operations, floating-point 
operations, or both. For some 
high-speed, real-time applica- 
tions, such as radar- and video- 
information processing, array 
processors operate on 12-, 16-, 
or 24-bit fixed-point data. How- 

ever, the trend is toward 32-bit 





PROGRAM 
MEMORY 


floating-point data processing. 
The data-memory—usually 
banks of high-speed RAM or 
PROM—supplies operands to the 
arithmetic unit and stores re- 
sults from the arithmetic unit. 
The data memory can have mul- 
tiple data ports, depending on 
how fast the memory chips must 
supply operands and accept re- 
sults. If it doesn’t have enough 
ports or enough speed, the data 
memory can become a process- 
ing bottleneck, leaving the arith- 
metic unit starved for operands. 


Controller is simple 


The controller sequences the 
array processor through its op- 
erations. Because most array- 
processing algorithms have mod- 
est sequencing requirements, 
the controller isn’t complex. 
Controllers provide a program 
counter (PC) that you increment 
to access the next program- 
memory word. You can also load 
the PC with the program memo- 
ry’s output to force the control- 
ler to jump to a different part of 
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Fig B—An array processor’s memory provides separate storage blocks for instruc- 
tions and data. The separate storage areas let the control circuits access instructions 
and data in parallel. 
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the program. The controller in- 
cludes a loop counter, which 
counts repeated operations, De- 
pending on the array processor’s 
sophistication, the controller 
may incorporate circuits that 
control nested subroutines, ijn- 
terrupts, and conditional-branch 
operations. 

The program memory stores 
the array processor’s microcode, 
which controls the other pro- 
cessor elements. Like the data 
memory, the program memory 
can be RAM or PROM. Use 
PROMs when the algorithms are 
well-defined and unlikely to 
change. Use RAM during algo- 
rithm development. The re- 
sources in the array processor 
determine the microcode memo- 
ry’s bit width. For example, a 
60-bit-wide program memory 
provides 30 bits that control the 
arithmetic unit, 15 bits that 
transfer information to the con- 
troller (including a 12-bit jump 
address), and 15 bits that control 
other internal array-processor 
resources. 

The host interface transfers 


‘data and instructions between 


the host computer and the array 
processor—usually by DMA op- 
erations. The host computer 
sends the array processor a 
block of data and an instruction 
word that selects a processing 
algorithm. After processing the 
data, the array processor trans- 
fers the results to the host 
computer. 
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An array processor can include ptpeline reg- 


isters that let the circuit overlap tasks. 










6-PORT RAM 
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64 + 32-BIT 
4-PORT RAM 
(2 Am29334s) 


64+ 32-BIT 
4-PORT RAM 
(2 Am29334s) 
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Am29325 


_ Fig 3~—A 6-port RAM speeds data transfers so that two math- 
processor chips can operate independently. The chips can process 
data from the memory or from one another. 


array processors provide both a general-purpose and a 
dedicated address-generator circuit. You'll find sepa- 
rate address generators for data and coefficient memo- 
ries in array processors that provide extremely high 
processing speeds. 

An address generator reduces the size of your array 
processor's program memory, and it increases the 
processor’s speed. To increase processing speed fur- 
ther, consider adding arithmetic hardware to your 
design so the processor can do several computations in 
parallel. In the basic array-processor design, the arith- 
metic unit performs one operation at a time—for exam- 
ple, sums of products, which involve alternate addition 
and multiplication operations. The array processor per- 
forms the multiplication and addition operations se- 
quentially. 

The throughput of the basic array processor is 250 
nsec per floating-point product term; to increase that 
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speed you can gang two 29325 floating-point math 
processors (Fig 3). The processors communicate 
through a 6-port RAM. When the circuit incorporates a 
multiport RAM, the floating-point processors can each 
access two input operands and store one result during 
each clock cycle. Because data produced by one float- 
ing-point processor is accessible to the other, you can 
double the processing speed for such algorithms as 
sum-of-products: One processor produces product 
terms, while the other processor sums and accumulates 
them. Of course, you can choose other math-chip config- 
urations that better suit specific array-processing 
tasks. Keep in mind, however, that although you gain 
higher-speed operations by providing parallel math 
chips, your programming tasks grow. Coordinating the 
software operations of several parallel math chips can 
be difficult. | 


Memory expansion increases throughput 


When you upgrade the arithmetic unit by adding 
parallel math chips, you must improve the data memory 
as well. The data-memory configuration in the basic 
array processor limits processing speed because the 
processor only accesses one constant and only performs 
one RAM-read or -write operation per clock cycle. To 
let the array processor perform operations that require 
two operands from RAM in the same cycle, or that 
require RAM-read and -write operations during the 
same cycle, you must upgrade the memory. Possible 
enhancements include converting the coefficient PROM 
to high-speed RAM, running the data RAM at twice 
the processor’s speed to allow single-cycle reading and 
writing, or replacing the data RAM with a 2-port 
RAM. | 

In addition to high processing speeds, some applica- 
tions may require rapid data transfers between the 
array processor and the host computer. There are at 
least two ways of speeding the transfer of data from the 
host to the array processor. First, you can replace the 
array processor’s data RAM with a 2-section memory 
(Fig 4) that gives the host computer access to one 
section while the array processor uses the other. When 
the array processor completes its task, it switches 
between the buffers. The host obtains the results from 
the array processor’s old buffer, while the processor 
operates with the data in the host’s old buffer. The host 
computer’s and the array processor’s operations are no 
longer sequential; instead, they overlap. You'll have to 
pay careful attention to the manner in which the array 
processor controls the 2-section memory, because you 
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HOST DATA 
FROM DATA 
PROM 
FROM DATA 
PROM 
Fig 4—A 2-section memory offers a speed enhancement. The host processor reads or writes from one section, while the array processor 


processes the data in the other section. 


don’t want to switch buffers while the host or the array 
processor is still using one. 

A second approach involves bypassing the host com- 
puter and letting the array processor take data directly 
from the data source—for example, an A/D converter. 
The processor uses the data and passes results to the 
host computer. 

The 2-section-memory and direct-data-input tech- 
niques aren’t mutually exclusive. In a given application, 
you might send data from an A/D converter directly to 
a 2-section memory. In this case, when the A/D con- 
verter’s memory is full, it switches the memory section 
to the array processor. 


Dividing the work load 


By adding both direct-data input and output ports to 
your array-processor design, you can connect several 
processors in series, letting each one perform a subset 
of your algorithm. After it processes a piece or block of 
information, each processor passes results to the next 
processor in the chain. 


tn i tare 


The basic array processor performs addition, sub- 
traction, multiplication, and format-conversion opera- 
tions. For complex and transcendental operations, 
you'll need specific microcode routines that offer cosine, 
sine, and other functions. Standard algorithms are 
available, so your programming tasks aren’t insur- 
mountable. Part 3 of EDN’s floating-point series will 
explore transcendental functions and tell how to imple- 
ment them. EDN 
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math functions 








This final article in a 3-part series describes how to 
encorporate a floating-point processor into your sys- 
tem. It discusses criteria for the selection of the algo- 
rithms you'll use, and in particular it details the 
methods used to implement transcendental functions. 





David Quong, Advanced Micro Devices 


If your application must perform a variety of math 
functions at high speeds on a wide range of input data, 
consider designing a math subsystem based upon a 
VLSI floating-point processor. A floating-point pro- 
cessor, a microsequencer, RAM, and ROM, configured 
as shown in Fig 1, together with the appropriate 
algorithms, will allow you to perform most math func- 
tions at real-time speeds with high precision and a very 
large dynamic range. A system of this type will outper- 
form even the fastest floating-point coprocessor. 

The choice of algorithms is an important step in the 
realization of your math processor. You can choose from 
a variety of methods for implementing transcendental 
and other math functions: The Taylor series, the 
Chebyshev series expansion, and the Newton-Raphson 
approximation are just a few of the many possible 
approaches. Which algorithm is the best one for your 
particular application will depend upon what functions 
you want to perform, the hardware architecture you are 


using, and the system throughput and accuracy you 
expect to receive. 

Many designers select the Taylor series for perform- 
ing math functions. This well-known method allows you 
to find equations for various functions in most books of 
math tables. The Taylor series has a major drawback, 
however: It has a nonuniform convergence rate in the 
number of terms needed to achieve a desired accuracy. 
Consider, for example, the Taylor series expansion of 
the sine function: 

3 5 7 
Sins) a, oe epee as 

For values of x near zero radians, this equation 
converges very quickly, but as x becomes larger, you'll 
need a larger number of terms to evaluate sin(x) to the 
same accuracy that you obtained for the smaller values. 

The Chebyshev expansion method, like the Taylor 
method, produces a polynomial approximation, but it’s 
not so well known. The generation of the Chebyshev 
approximation for a particular function is more complex 
than for the Taylor series, but the resulting polynomial 
is just as easy to implement. The major advantage of 
the Chebyshev method is that it has uniform conver- 
gence. Moreover, for any given function, over the 
operating range of the Chebyshev series this method 
yields smaller errors than almost any other method. 
You can usually determine by inspection the upper 
bound of the error; the error of the truncated series 
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A math-processing subsystem incorporating 
a VLSI floating-point processor will outper- 
form even the fastest available floating- 
point coprocessor. 
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cannot exceed the sum of the absolute values of the 
remaining Chebyshev coefficients. (For details of the 
derivation of the Chebyshev series, see box, “Deriving 
a Chebyshev series.”) 


Iteration handles simple functions 

For some simple functions such as division and 
square-root extraction, the Newton-Raphson method, 
an iterative approach for approximating such functions, 
works well. When using this or any other iterative 
method, you have to start with a seed, or initial 
approximation. The better this approximation is, the 
faster will be the convergence. You can store predeter- 
mined seed values in a look-up table. This method 
usually requires extra hardware (in the form of ROMs), 
but it gives you flexibility, because you can store seed 
values that are as accurate as you want. 

The chief attraction of the Newton-Raphson method 
is its rapid convergence; the number of iterations 
required is low. The method converges quadratically, 
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ie, the order of the error is squared by each iteration. 
For example, if the seed is accurate to eight bits, the 
first iteration improves the accuracy to 16 bits, and the 
second iteration improves it to approximately 32 bits 
(variance depends on the magnitude of the error). 
The math processor shown in Fig 1 evaluates 
Chebyshev and Newton-Raphson approximations very 
efficiently. The system performs transcendental (trigo- 
nometric, logarithmic, and exponential) functions by 
the Chebyshev method and division and square-root 
extraction by the Newton-Raphson method. 


Understand the algorithms 

The algorithms for 10 very common math functions 
are described below. You'll need these functions for 
applications associated with navigation, guidance, 
image processing, signal processing, and many other 
areas. The algorithms for the transcendenta! functions 
are based on the Chebyshev method and consist of a 
3-stage process. The first stage reduces the range of 
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Fig 1—This math subsystem is based on a VLSI floating-point processor. It performs math functions with high precision and a large 
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Deriving a Chebyshev series 


The Chebyshev series expansion 
is a procedure for generating a 
polynomial approximation for a 
given math function, f(x). To 
expand the function, you must 
express it as a Chebyshev 


_ series: 


f(x)=0.5C)+C,T,(x)+C2T.(x)+. 8 


for --1=x<1, where T,(x) is the 
Chebyshev polynomial of degree 
n given by 


T,(x)=cos(n X acos(x)) 


and C, is a coefficient of the 
Chebyshev series. The value of 
C, is dependent upon the 
function f(x). You can determine 
the value of C, by evaluating the 


_ following relationship: 


c = 2 [7 fe) TAG) 

ee Jas! ae 
Alternatively, you can obtain the 
C, coefficients in tabular form, 
for a wide variety of functions, 
from books on mathematical 
tables (Ref 2). 

Examples of the T,(x) 

polynomial include the following: 


Ox. 


Ty(x) = cos(Q) = 1 
T,(x) = cos(acos(x)) = x 
T.(x) = cos(Zacos(x )) 
2 cos? (acos(x)) 
= 2x? — ], 


You can generate a polynomial 
equation for a function by 
combining the above equations 
and combining terms with 
common exponents. The 
accuracy of the result depends 
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upon the number of terms you 
use. (If you are interested in a 
formal derivation of the 
Chebyshev method, see Refs 1 
and 2.) 


Expansion for sine function 


If you want to find the 
Chebyshev expansion for the 
sine function, first go to the 
coefficient tables in Ref 2 and 
look up the coefficients for the 
sine function (or calculate them 
from the formula given above). 
Next, determine the number of 
coefficients required to provide 
the accuracy you want. For 
example, to achieve 24 bits of 
accuracy, the error should be no 
greater than one part in 17 
million. Compare the magnitude 
of this largest acceptable error 
with each of the coefficients. 
The first term that contains a 
coefficient that’s less than the 
error can be the last term in the 
series. It’s common practice, 
however, to include one extra 
term in the series. 

Using the above criteria, you 
need only six coefficients for the 
sine function using sin(?27rx) in 
order to obtain a result that’s 
accurate to 24 bits. These 
coefficients are 


@ Co=Cyinn= +2.552557925 
@ C,=Cyin1 = —0.285261569 
@ C.=Cne=+9.118016007 
x 10-8 
C3=Cging= — 1.865875135 
x10 
Cy=Caing= +1.184961858 
x10-" 
Cs=Csins= —6.702792 x 10° 


Substituting the T,x 
polynomials into the Chebyshev 
series gives 


sin(mx) = 
0.5C) a Cix + Ce (2x? = 1) ‘ 
+ Cs; (4x? — 3x) 
+ C, (8x* — 8x? + 1) 
+ C; (16x° — 20x? + 5x). 


Simplifying the terms gives 


sin(¥e7rx) = a + ajX + aox? 
+ asx? + ayx? + a5x°. 


where 


a= (0.5)Co—Cot+ Cy 
a=C, —8C3;+5C; 
ae=2C2.—8C, 
a;=4C3—20C; 
a,=8C, 

as=16C:. 


The final result for the sine 
function is a simple polynomial 
equation that you'll find easy to 
implement. You can precalculate 
the coefficients a) through a; and 
store them in a ROM table. You 
can apply the same procedure to 
any well-behaved function for 
which you can find or compute 
the Chebyshev coefficients. 








The Chebyshev expansion method, like the 
Taylor method, produces a polynomial ap- 
proximation, but it’s not so well known. 


the input arguments to values between +1 and ~1, 
because the Chebyshev expansion operates only over 
this range. The second stage evaluates the polynomial 
derived from the Chebyshev expansion. The third 
stage performs any postprocessing that may be re- 
quired, such as correction of the sign. 

The detailed descriptions were developed by 
Clenshaw, Miller, and Woodger (Ref 1). They use the 
terms RND and CSERIES: RND indicates that the 
result of the operation must be rounded towards minus 
infinity, and CSERIES indicates that the Chebyshev 
series for the input must be evaluated. 


Range reduction prepares arguments 

The range-reduction steps for the sine function are 

@ x=x(2/m) 

® x=x—(4(RND(0.25(x+1)))) 

e Ifx>1 then x=2-x. 
As noted, these steps reduce the input argument to the 
range -1=x=1. You then evaluate the sine function by 
summing the terms of the following polynomial equa- 
tion derived for the sine function: 


sin(x)=x(CSERIES,,,(2x?— 1)). 


The range-reduction steps for the cosine function are 
@ x=x(2/7) 
@ x=4(RND(0.25(x+2)))-x+1 
e If x>1 then x=2-x. 
You then evaluate the cosine function by using the same 
polynomial equation as for the sine function: 


cos(x)=x(CSERIES,,(2x?— 1)). 


The range-reduction steps for the tangent function 
are | 

@ x=x(2/m) 

@ x=x—(4(RND(0.25(x+1)))) 

@® y=x 

e Ifx>1 then x=2-x. 
The Chebyshev polynomial evaluation for the tangent 
function is 


tan(x)=x(CSERIBES,,,(2x?—1)). 
You have to perform one postprocessing step: 
If y>1 then tan(x)=1/tan(x). 


You don’t need any range-reduction steps for the 
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arcsine function, because all values outside the range 
—l=xs1 indicate an error condition. For input argu- 


~ments in the range x*s'%, you evaluate the arcsine as 


follows: 
asin(x)=x(V2(CSERIESasin(4x?—1))). 


For input arguments in the range ¥e<x’<1, you evalu- 
ate the arcsine as follows: 


asin(x)=sign(x)(m/2)(V 2—2x’)(CSERIES,xin(3—4x°)), 


where sign(x) is the sign of x. 
You use the following trigonometric identity to evalu- 
ate the arc-cosine function: 


acos(x)=1/2—asin(x). 


The range-reduction steps for the arctangent func- 
tion are 

@® u=x 

e If ABS(x)>1 then x=1/x, 
where ABS(x) is the absolute value of x. The 
Chebyshev polynomial evaluation is 


atan(x)=x(CSERIES,.,(2x*—1)). 
The postprocessing steps are 


If u>1 then atan(x)=+(a/2)—atan(x) 
and 
If u<—1 then atan(x)= —(m/2)—atan(x). 


The range-reduction steps for the exponentiation 
function are 

@® x=x(log.e) 

@ N=1+RND(x). 
The Chebyshev polynomial evaluation is 


exp(x)=2(CSERIES,,,(2(N —x)—1)). . 


Only positive values are valid input arguments for 
the natural-log function; a zero or a negative value 
should be flagged as an error: 


In(x)=(CSERIES,,(4(mant(x))—3))+(expo(x)—1)(n(2)), 


where mant(x) is the mantissa value of x, expo(x) is the 
exponent value of x, and In(2) is a constant value. 
You perform division operations by evaluating the 
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reciprocal function. For example, you can express the 
division operation. C=A/B in its reciprocal form, 
C=A(1/B). By using the Newton-Raphson method, you 
can find an iterative expression for the reciprocal 
function. This expression is 


Xj+1=X(2— B(x), 


where x, is the initial divisor reciprocal (seed value) for 
i=0, and X; is the ith approximation. 

The square-root function also uses the Newton- 
Raphson method. The iterative expression for the 
inverse square-root function is 


X;.;=0.5(x,(3.0—-Ax,”)). 
You then evaluate the square root of A by the equation 
B=A(xXj-1), 


where A is the input argument, B is the square root of 
A, Xo is the initial approximation (seed value) for i=0, 
and x; is the ith approximation. 

The principal component of the math-processor sub- 
system described here is the Am29325 floating-point 
processor. The subsystem also contains RAM, bipolar 
PROMs to store coefficients, a subsystem controller, 
and a host interface. The floating-point processor per- 
forms all computations under control of the subsystem 
controller; microcoded programs to perform the func- 
tions you need reside in the subsystem controller’s 
PROM. If you wish to modify existing functions or add 
new functions, you merely change the microprogram- 
med PROM. 

The Am29325 floating-point processor (Fig 2) pro- 
vides many features that simplify subsystem design. 
The 3-port, 32-bit I/O structure of the Am29325 avoids 
data multiplexing and allows efficient transfer of infor- 
mation. The 32-bit internal registers and data paths 
allow the chip to store the results of intermediate 
calculations for use in subsequent operations, thereby 
avoiding the delays that transfer of these results to and 
from off-chip storage would entail. Many functions don’t 
need to send data out of the chip until the final results of 
an operation are ready. 

The floating-point-processor hardware detects excep- 
tional conditions and, rather than compounding the 
error until the end of the calculation, immediately 
notifies the host system. The chip notifies the host by 
means of flags that indicate underflow, overflow, inva- 
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Fig 2—This VLSI floating-point processor is fast because it contains 
all the major components for 32-bit operations on a single chip. It has 
one input for an external clock and 17 inputs for instruction-select 
and control functions. 
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lid operation, and other error conditions. 

Subsystem data storage consists of a high-speed, 
4-port RAM. You can load the data memory from the 
host computer (using DMA), from the floating-point 
processor, or from an integer processor. You'll need to 
process integers during operations such as isolating the 
exponent and mantissa portions of a floating-point 
word. You can have the host processor perform integer 
processing, or you can arrange it so. that the math 
subsystem performs the required operations by incor- 
porating an integer processor chip in your design. 


Learn to microprogram the processor 

Two examples of how to implement math functions on 
the Am29825 floating-point processor will give you an 
introduction to the microcoding procedures you'll use in 
the math processor. Recall, that, for a given division 
operation (C= A/B), the Newton-Raphson division algo- 
rithm begins by obtaining the reciprocal of the divisor 
by means of an iterative equation. A single iteration 
requires just three arithmetic operations: 

@ multiplication: B(x))=u 

@ subtraction: 2—u=v 

@ multiplication: v(x)= x4. 
You can microcode this procedure with a 3-instruction 
loop that you repeat until you obtain a sufficiently 
accurate value of x\.;. You then perform a single multi- 
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The math processor uses the Newton- 
Raphson method to execute the dtviston 
and square-root functions. 








plication, AXx;,,, to obtain the quotient. 

The conventional way to obtain a seed is to use the 
most significant 16 or so bits of the divisor as a pointer 
into a look-up table in ROM; the contents of the address 
to which the divisor bits point become the seed output, 
which usually has approximately the same number of 
bits. You might think that use of a 16-bit address would 
require a ROM that’s 64k words deep, but this is not so. 
In floating-point division, you can reciprocate the expo- 
nent and significand separately, each from its own 
table, and then recombine them. Consequently, for an 
8-bit exponent and the eight most significant bits of the 
significand, you require only two tables, each just 256 
words deep. 

You can also trade ROM word width for execution 
time (ie, the number of iterations); doubling the width 
of the significand stored in ROM will reduce reciprocal 
refinement time by roughly one iteration. Convergence 
is specified by the inequality 2/B>I|x9|>0. 

The microcoding for the complete Newton-Raphson 
division is shown in Table 1. The operation requires six 
lines of microcode. In cycle 1, you load the seed into 
register R of the floating-point processor and load the 
divisor into register S. In cycle 2, you multiply the 
contents of registers R and S; the result appears in 
register F’. 

In cycle 8, you perform the subtraction, using the 
. 2-8 instruction of the floating-point processor. The 
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input for port S comes from register F via the internal 
feedback path. The result of the subtraction appears in 
register F, 

In cycle 4, you perform the second multiplication. 
This operation multiplies the contents of register F (via 
port 8) by x; (from register R). The result, x;.,, replaces 
X; in register R. In parallel with the multiplication, the 
microsequencer executes a jump back to cycle 2 to 
begin the next iteration. 

Cycle 5 begins after the last iteration of cycles 2 
through 4. In this cycle, you load the dividend (A) into 
register S and multiply it by the contents of register R 
to produce the final result. This result appears in 
register F, from which you can unload it via the F bus 
to local data storage or to the host. 

The second implementation example uses the 
Chebyshev method to perform a sine calculation. In the 
polynomial equation that evaluates the sine function, 


CSERIESgin= a + a1 X + aoX? + agx? + ayx!t asx?, 


The range-reduction steps require eight or nine oper- 
ations. Evaluation of the polynomial equation requires 
23 additional operations, including processing of the 
2x’—1 expression. One final operation multiplies the 
result of the polynomial evaluation by x. The sine 
function therefore requires 32 or 33 operations. 

You can, however, save 10 cycles in the evaluation of 
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The floating-point processor hardware de- 
tects exceptional condttions and, rather 
than compounding the error, immediately 
notifies the host system. 


the polynomial equation by applying Horner’s Rule, an 
algebraic method for rearranging components in a 
polynomial. The polynomial equation then becomes 


CSERIES«g,= ((((asx + a4)X +ag)X + ag)X +a;)X +a. 


The total number of operations in the sine function then 
decreases to 22 or 28. Evaluation of the rearranged 
polynomial equation is complete in 10 clock cycles. 

In cycle 1, you load x into the S register and a; into 
the R register. Multiply these two operands to produce 
asx. In cycle 2, you load the result of the multiplication 
into the F register, load a, into the R register, and add 
the contents of the F and R registers to yield 


(as X X)+ay. 


In cycle 3, you load the result of the addition into the 
R register; the S register still contains x. Perform RxS 
to obtain 


((as x X)+a,)x. 


Cycles 4 through 10 perform similar addition and 
multiplication operations, progressively using the 
terms a; through a. The final result of evaluating the 
polynomial equation is available in the F register after 
cycle 10. 

The ability to perform both simple and complex math 
functions rapidly is critical in systems that process data 
in real time. You won’t yet find many simple, compact 
solutions to this problem on the market. Math-coproc- 
essor ICs are available, but they are still in the low- to 
medium-performance range, and they limit you to a 
microprocessor environment. (Table 2 shows compara- 


tive timings for two floating-point coprocessor chips 
and the Am29325 floating-point processor.) 

You can design and build your own MSI chip, but such 
a product will require much development time and cost, 
and it will probably be large and consume lots of power. 
Another possible approach is to compute the values of 
the math functions you will need and to store these 
values in ROM, but such a look-up-table method is 
adequate only for small amounts of data. At the present 
time, the use of a math subsystem based upon a VLSI 
floating-point processor with a relatively small amount 
of support circuitry appears to be the most cost- 
effective solution. EDN 
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TABLE 2—TIMING COMPARISON 
OF SINGLE-PRECISION FLOATING-POINT FUNCTIONS 


FLOATING-POINT MULTIPLY 


(uSEC) 
INTEL 80871 
- MOTOROLA 688812 
AMD Am29325 


NOTES: 
N/A = TIMES NOT AVAILABLE. 


DIVISION 
(uSEC) 


COSINE 
(uSEC) 


TANGENT 
(uSEC) 


SQUARE ROOT 
(uSEC) 


1. TIMES FOR THE INTEL 8087 WERE DERIVED FROM THE INSTRUCTION CLOCK COUNT GIVEN IN THE INTEL DATA PAMPHLET (1984) ALL 


TIMES LISTED ARE WORST CASE. 


2. TIMES FOR THE MOTOROLA MC68881 WERE TAKEN FROM A NEWS ITEM IN ELECTRONIC PRODUCTS, FEBRUARY 15, 1985, PG 43. 
3. THIS OPERATION IS NOT COVERED BY THE INSTRUCTION SET AND MUST BE IMPLEMENTED BY USING OTHER INSTRUCTIONS. 
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Optimize your 
graphics system 
for 2-D and 3-D 





The design of a graphics system that’s both 
—2-dimenswonal and 3-dimenstonal poses 
some conflicting requirements. You can rec- 
oncile some of these conflicts, however, 
through careful design of the frame-buffer 
structure, and you can achieve adequate 
speed for 3-D applications by using parallel 
processors for computation-intensive tasks. 


Anoop S Khurana and Olivier Garbe, 
Advanced Micro Devices Inc 


A graphics system that will handle both 2- and 3- 
dimensional applications presents design requirements 
that are at odds with one another. These conflicts arise 
from the fundamental differences in the nature of the 
geometry-, pixel-, and display-processing tasks re- 
quired by the two systems. A system with a micropro- 
grammed architecture can help you avoid the difficul- 
ties you’d encounter in reconciling these differences. 
You’d use a 2-D graphics system with such graphics 
editors as MacDraw, MacPaint, and Interleaf, or with 
CAE programs such as schematic-capture packages or 
layout editors for pe-board design. You’d need a 3-D 
system, on the other hand, to display 3-D wire-frame 


models, to model solids for mechanical design, or to 
produce visually pleasing 3-D pictures for animation. 

One of the major differences lies in the size of the 
frame buffer needed, and the speed with which the host 
computer can obtain access to it. Most 2-D systems 
need only eight bits to define a pixel color as one of 256 
simultaneously displayable colors. A 3-D system, on the 
other hand, needs eight bits each for red (R), green (G), 
and blue (B)—a total of 24 bits per pixel. Also, 2-D 
pixel-processing operations require fast access to multi- 
ple pixels during the same frame-buffer cycle. In a 3-D 
system, by contrast, pixel-processing operations (such 
as Gouraud shading) are computation-intensive but 
require access to only one pixel at a time. 

Similarly, geometry-processing operations are more 
arithmetic-intensive in 38-D than in 2-D systems. Fixed- 
point, 32-bit arithmetic provides adequate computa- 
tional power and speed for many 2-D applications, 
whereas 3-D applications need the speed and versatility 
of fast floating-point arithmetic. 

Most of the graphics systems available today, includ- 
ing engineering workstations, are optimized for 2-D 
graphics operations; if they have 3-D capabilities, they 
perform the required processing mainly in software, 
which is slow. To obtain adequate speed, then, serious 
users of 3-D graphics find that they need a separate 
system that’s optimized for 3-D graphics, resulting in 
an expensive duplication of hardware and software. 

You can avoid these disadvantages by designing a 
single graphics system that provides all the features 
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A 2-dimensional graphics system can han- | 
dle diagrams, but you need 3-dimenswnal 
capability for mechanical modeling. 


necessary for both 2-D and 38-D graphics. You'll find a 
microprogrammed architecture ideal for such a system, 
because such an architecture lets you customize the 
data paths and computational resources to a particular 
application and to the performance level that you want. 
It also lets you integrate both fast integer and fast 
floating-point arithmetic capabilities, both of which are 
necessary for complex graphics operations, into a single 
system. 

As an example of such a system, consider the design 
of a graphics peripheral for a conventional minicomput- 
er. This peripheral can act as a bus master on the host’s 
system bus, but it need not do so. The application 
program runs on the host computer and generates a 
display list, defining the image, which the CPU passes 
to the graphics peripheral via a DMA channel (or by 
any other appropriate means). The graphics peripheral 
processes this display list to generate the image. (The 








steps that convert a display list to an image on.the 
sereen are collectively referred to as the “graphics 
pipeline”; see box, “From object to image: the graphics 
pipeline.”) The three main functional blocks of the 
system are the communications and display-list han- 
dler; an update processor that performs geometry and 
pixel processing; and a display controller (Fig 1). 

A conventional, general-purpose, 16- or 32-bit pP, 
which has its own memory and DMA channel, receives 
and executes commands issued by the host. This com- 
munications processor can directly execute some host 
commands, such as Load Display-List. Other com- 
mands, such as Render Display-List, involve the rest of 
the graphics system; the communications processor 
analyzes these commands and dispatches appropriate 
commands to the update processor, using a message- 
based protocol and a fast, dual-access memory block 
that serves as a mailbox. 


From object to image: the graphics pipeline 


The graphics pipeline is the se- 
quence of operations that trans- 
lates the user’s description of a 
scene into a viewable image. The 
four stages in this process are 
display-list handling, geometry 
processing, pixel processing, and 
display control. 


APPLICATION | 


PROGRAM 





MODEL HOUSE 


The display-list handler helps 
the user or the application pro- 
gram decompose objects to be 
depicted into a display list. The 
display list is usually hierarchi- 
cal, and it embodies the struc- 
ture inherent in the object being 
modeled. Leaf nodes in the hier- 


VIEWING 
MODEL 


| GEOMETRY 
PROCESSING 


RECTANGLE RECTANGLE PENTAGON PENTAGON RECTANGLE RECTANGLE 


DISPLAY-LIST CREATION AND TRAVERSAL 





archy are drawing primitives 
provided by the graphics 
system. 

The geometry processor per- 
forms viewing- and perspective- | 
transformation operations on the 
display list, and it clips objects 
against the boundaries of the 


PIXEL 
PROCESSING 


DISPLAY 
PROCESSOR 


The graphics pipeline consists of the processing steps needed to convert a graphics object description, in digital form, into a viewable 


image on the screen. 
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| MICROPROGRAMMED 


UPDATE proeenatl = MONITOR | | 
PROCESSOR C L 


GRAPHICS SUBSYSTEM 





Fig 1—A graphics subsystem is ideally an intelligent peripheral that accepts a display list from the host computer and converts the digital 
representation of an image into a standard video signal that creates a screen display. 


The dual ports of the mailbox allow the update 
processor to read a command while the communications 
processor is sending a subsequent command. Sema- 
phores, also located in the mailbox RAM, govern both 
command chaining and the allocation of memory to 
message buffers. 

The microprogrammed update processor executes all 





commands that are related to geometry or pixel pro- 
cessing. Such operations may update the pixel data in 


the frame buffer, or they may pass a message back to | 


the communications processor. 

The frame buffer uses video RAM (VRAM) ICs, both 
to maximize bandwidth and to minimize the quantity of 
hardware needed for refreshing the image. The frame- 


viewing volume. You can decom- 
pose the complex primitives used 
by the geometry processor, such 
as patches or cubic curves, into 
simpler primitives, such as poly- 
gons or lines. 

The pixel processor physically 
writes all the pixels affected by 
a primitive into their correct lo- 
cations in the frame buffer. It 
also performs all operations, 
such as pixel-block transfers, 
that require pixels to be read 
from or written to the frame 
buffer. 

The display controller con- 
verts the pixel values stored in 
the frame buffer into a standard 
video signal. This video signal, 
when transmitted to a suitable 
monitor, builds the desired 
image on the screen. 

A single, general-purpose pro- 
cessor, such as the Intel 80286, 
along with the 80287 numeric co- 
processor, can perform all the 
operations in the graphics pipe- 
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line sequentially. In such a sys- 
tem, the main processor writes 
the fina) value of each pixel to 
the frame buffer, which forms 
part of the address space of the 
main processor. This configura- 
tion is relatively slow, however, 
and the speed may be inade- 
quate for 3-D applications. 

You can achieve improved per- 
formance by using specialized 
VLSI peripheral devices, such 
as the Am95C60 Quad Pixel Da- 
taflow Manager, to speed some 
of the operations in the graphics 
pipeline. Most current graphics 
peripherals relieve the main 
processor of most of the pixel- 
processing tasks. Typical func- 
tions performed by such periph- 
erals are line drawing, polygon 
filling, and block transfer of pix- 
els. Because these tasks are rel- 
atively standard and are well 
suited to implementation in 
high-performance silicon, graph- 
ics peripherals yield a substan- 





tial improvement in system per- 
formance. You can achieve a 
similar improvement by using 
high-performance floating-point 
processors to speed the compu- 
tation-intensive geometry-proc- 
essing tasks. 

For even higher performance 
and functionality, you should 
consider the use of multiproces- 
sing systems that provide one or 
more processors for each stage 
in the graphics pipeline. Two 
factors contribute to the im- 
provement in performance that 
such systems yield. First, be- 
cause most graphics operations 
are vector operations, the con- 
current performance of several 
parts of a task can yield a speed 
increase that’s proportional to 
the number of processors avail- | 
able. Second, you can fine-tune 
the system by customizing it for 
highest performance in just 
those operations that the appli- 
cations require. 
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_A microprogrammed architecture lets you 
customize the resources of the system to the 
problem you've trying to solve. 


buffer controller provides all the signals needed for © 


reading, writing, and refreshing the VRAMs, and for 
performing all video-refresh functions. 

You’ll need to organize the structure of the frame 
buffer carefully to make the most efficient use of the 
available storage. As noted, for 2-D displays you need 
only eight bits per pixel, which allows you to display the 
pixel in one of 256 colors. For 3-D displays, you need at 
least 24 bits per pixel (eight each for the R, G, and B 
channels); you may also need, for each pixel, an addi- 
tional eight bits for the alpha channel and 16 or 32 bits 
for the Z buffer (a maximum of 64 bits/pixel). 

You can reduce the total number of bits per pixel by 
mapping the Z buffer into a portion of the frame buffer. 
For example, in a 2k-pixelx1k-line buffer, you could 
map a 1k x 1k-pixel screen into the first 1k pixels of each 
line and the Z buffer into the second 1k pixels. Conse- 
quently, you could access the Z value of a pixel by 
adding an offset of 1024 to the pixel address. You would 
need two memory cycles to access both the RGB and the 


Z values of the pixel. This structure, however,-has the 
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great advantage that no bits are irrevocably dedicated 
to the Z buffer. If you don’t need a Z buffer, this 
memory becomes available for general use. 

You'll still have to resolve the discrepancy between 
the eight bits/pixel needed for 2-D and the 24 bits/pixel 
needed for 3-D. Your first thought might be to allocate a 
32-bit memory word for each pixel, but then‘you’d be 
wasting 24 bits in 2-D operations. A better solution is to 
allow each 32-bit word to be treated as four adjacent 
8-bit pixels in 2-D. You could then reorganize a 
2k x 1k X32-bit memory as a frame buffer of 8kx1kx8 
bits. This organization allows you to store one 3-D 
screen with a resolution of 1024 pixels x 1024 lines x32 
planes, or several 2-D screens at once. 

The frame buffer in our example consists of 64k x4- 
bit VRAMs and uses the shifter port of each VRAM for 
video refreshing; the update processor therefore has 
virtually unlimited access to the frame buffer. It’s 
possible to organize each VRAM as a 256x256 x4-bit 
Square area of memory; using this area as a building 
block, you can create a 2kx1kx4-bit memory array 
having four rows and eight columns (Fig 2). If you want 
to extend the depth of the array to 32 bits/pixel, you'll 
need eight VRAMs in each element (called a bank) of 
the array. 

The video display controller (VDC) provides com- 
plete control of the frame buffer, both for update 
operations and for video-refresh operations. In re- 
sponse to a read or write memory-cycle request from 


the update processor, the VDC generates the appropri- 
ate VRAM-control signals (RAS, CAS, ete). If a dy- 
namic-RAM refresh cycle or a transfer cycle for video 
refresh is already in progress, however, the VDC 
delays execution of the update cycle until the higher- 
priority cycle is finished. 

Because each access to the frame buffer reads or 
writes a 32-bit word, the 2kx1kx32-bit frame buffer 
requires 21 address lines, of which 11 define the X 
address and the other 10 define the Y address within 
the array. In the 3-D 32-bit/pixel mode, each 32-bit 
word in the frame buffer represents one pixel. 

In the 2-D 8-bit/pixel mode, each 32-bit word repre- 
sents four pixels. The 18 most significant address bits 
select the 8-bit row address, the 8-bit column address, 
and RAS strobe signals. Decoding the three: least 
significant bits yields a decode signal that selects one of 
eight adjacent pixels. 

The capacitive loading imposed by the VRAMs makes 
it necessary to buffer the address and control outputs of 
the display controller. To reduce skew between signals, 
and thereby achieve a shorter memory-cycle time, you 
can buffer the address, RAS, CAS, and XF’G signals 
within a single IC package, such as the Am2976 11-bit 
dynamic memory driver used in this example. 





Select one of eight pixels 

Each of the eight rows in the frame memory receives 
a separate RAS signal. You can therefore connect to a 
common 82-bit bus the data ports of all four banks of 
VRAMs within a column. Each memory cycle now gives 
access to eight pixels, one from each column. The 
update processor operates on only 32 bits at a time, 
however, so you'll need a mechanism to select just one of 
the eight available words. 

You can perform this 8:1 multiplexing quite simply by 
decoding the three least significant address bits to 
obtain the CAS signal. As a result, only one bank in 
memory receives both RAS and CAS. Consequently, 
you can tie together the outputs of al] 32 banks in 
memory, but only the selected bank will drive the bus. 
To access eight sequential pixels, then, you’d need eight 
memory cycles. | 

There’s another way to perform the multiplexing, 
however—one that gives the update processor very 
rapid random access to any or all of the eight adjacent 
pixels addressed in a single memory cycle. This method 
requires eight 32-bit, bidirectional, bus-interface regis- 
ters. You connect the eight 32-bit words, accessed in 
parallel from the memory, independently to one port of 
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Fig 2—This frame buffer is organized as 2k pixels x 1k lines x32 bits. Three-dimensional applications can read or write eight adjacent pixels 
at one time. For 2-D applications, each 32-bit word represents four 8-bit pixels. 


EDN March 18, 1987 


6-127 








CHAPTER 6 
Articles/Application Notes 





A muicroprogrammed graphics system acts 
as a peripheral on the host computer's sys- 
tem bus. 


these registers. To the other port you tie corresponding 
bits of each register together to form a single 32-bit bus 
that leads to the update processor. You then perform 
the 8:1 multiplexing by controlling the output-enable 
signals of the registers. 

The update processor regards the registers as inde- 
pendent 8-pixel input and output buffers. A memory- 
read operation fills the input buffer, and the update 
processor can fetch any or all of the eight pixels much 
more quickly than if a separate memory cycle were 
required for each one. You can also provide two differ- 
ent write modes. In the first mode, the update pro- 
cessor writes just one pixel to the appropriate place in 
memory. In the second mode, the update processor fills 
all eight registers, and the memory cycle writes their 
contents to eight different pixels simultaneously. 

Refreshing the video display is easy when the display 


CLOCK 
GENERATION 


32 8-BIT SHIFT REGISTERS 


memory consists of VRAMs. At every vertical-sync 
(Vsync) pulse, the display controller resets an internal 
video-refresh counter to the address of the upper-left 
corner of the screen. At every horizontal-syne (Hsync) 
pulse, the controller initiates a transfer cycle that 
transfers data for the next scan line into the VRAMs’ 
shift registers and then increments its internal address 
counter to point to the start of the data for the next 
line. You can perform panning and scrolling simply by 
changing the address held in the controller’s top-of- 
frame register. 

Given that there are eight memory banks per row, 
and that each VRAM is capable of shifting at a clock 
speed of 25 MHz, a total bandwidth of 200M pixels/sec 
is possible in 3-D mode. In 2-D mode, the available 
bandwidth becomes 800M pixels/sec. The maximum 
pixel bandwidth is therefore limited mainly by the 
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Fig 3—You’ll need two video shift registers if you want to reconfigure the frame buffer from 32-bit, 8-D pixels to 8-bit, 2-D pixels or vice versa.’ 
The main register handles eight sequential 32-bit pixels; the secondary register reformats the RGB bit streams from the main register into RGB 


streams representing 8-bit pixels. 
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characteristics of the shift registers and the associated 
D/A converter, not by those of the memory. 

In 32-bit/pixel mode, strobe signals generated by the 
video clock generator—in this example, an Am8158— 
load into the video shift registers the eight sequential 
32-bit pixels that are in parallel on the video bus (Fig 
3). The video shift registers consist of 16 dual, 8-bit, 
parallel-in, serial-out ECL shift-register ICs. These 
ICs produce serial bit streams of the R, G, and B values 
of each pixel and forward these bit streams to a triple 
8-bit D/A converter. 

In 8-bit/pixel mode, the 32 bits that appear at the R, 
G, and B outputs of the shift registers actually repre- 
sent four pixels. Four 4-bit ECL shift registers convert 
the 32-bit data into four 8-bit pixels for use by the 
Am8151 ECL color palette. To change from one mode to 
the other, you need only make the appropriate modifi- 
cations to the Shift and Load signals to the shift 
registers. 

The Am8158 generates the pixel clock pulse and some 
of the Shift and Load signals used by the shift regis- 
ters. This IC also generates the Vsync, Hsync, and 
Blank pulses. The display controller uses these signals 
to initiate VRAM transfer cycles, and the D/A convert- 
ers use them to force the video signals to the appropri- 
ate sync or blank levels. You can program all the 
important parameters of these signals using registers 
contained in the Am8158. 


The update processor is microprogrammed 

The update processor performs all pixel- and geome- 
try-processing functions for both 2-D and 3-D graphics. 
These functions require powerful and versatile data- 
transfer capability coupled with fast integer and float- 
ing-point arithmetic. Implementing the update pro- 
cessor as a microprogrammed subsystem allows you to 
achieve the high performance that you need. 

The major functional blocks and buses of the update 
processor are shown in Fig 4. The main data path in this 
example consists of the Am29332 integer ALU, the 
Am29323 integer multiplier, and the vector floating- 
point arithmetic unit, which consists of two Am29325 
ICs. Each of these units accepts data from two common 
32-bit input buses and places its results on one common 
32-bit output bus (the main data bus). 

An Am29334 register file provides storage for fre- 
quently accessed data. Its read ports supply data to the 
arithmetic unit’s input buses. It also has two write 
ports, one of which accepts data from the main data 
bus, while the other transfers the result of an ALU 
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operation back to the register file without using the 
main data bus. The system timing is such that the ALU 
can fetch two operands from the register file, process 
them, and write the result back to the register file 
within a single microcycle. 

The update processor addresses 64k 32-bit words of 
high-speed local data memory, which consists of static 
RAM. An Am2131 dual-port message-buffer IC occu- 
pies 1k words of the 64k-word address space. To allow 
the main ALU to process video data at maximum 
efficiency, an auxiliary Am29C101 16-bit ALU performs 
all local-memory address computation; the outputs of 
this ALU are captured in a 16-bit address register. 
Random accesses to local memory therefore take two 
microcycles—one to compute and latch the address, and 
another to access the RAM. During consecutive memo- 
ry accesses, however, next-word computation overlaps 
the current RAM access, so that the second and subse- 
quent memory accesses are completed in a single micro- 
cycle. 

The frame-buffer-address generator consists of pre- 
settable up/down counters (an 11-bit counter for the X 
address and a 10-bit counter for the Y address). The 
sequencer loads these counters via the main data bus. 
Although the main ALU is primarily responsible for 
generating frame-buffer addresses, use of the counters 
speeds the critical loops in curve drawing and other 
pixel-processing functions. 

The update processor is configured with a single level 
of pipelining, so that next-address computation over- 
laps execution of the current microinstruction. The 
Am29331 sequencer computes the address of the next 
instruction in response to its instruction inputs, and it 
places the result on its Y output bus. For access to 
sequential microcode addresses, this result is simply 
the contents of the program counter. The sequencer 
uses an internal stack to store count values for nested 
loops and return addresses for calls to microcode sub- 
routines. 

To execute a jump to an address defined by the 
microcode, the sequencer connects the address section 
of the microinstruction word back into its program 
counter via the A bus. To allow the computation of jump 
addresses at run time, and to allow external examina- 
tion of the sequencer’s stack and stack pointer, the D 
bus connects to the main system bus. 

An internal condition-code multiplexer, controlled by 
microcode, selects and enables one of the condition 
inputs of the sequencer; the sequencer can then test 
that condition and jump according to the state of the 
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The organization of the frame buffer is the 
key to resolving conflicts between 2-D and 
3-D requirements. 


selected input. For testing as many as four conditions 
simultaneously, a PAL device accepts all the signals 
that need to be tested simultaneously and encodes them 
into four fields of four bits each. A base address is 
assigned to each field, and the state of the field defines 
one of 16 sequential locations as an offset from the base 
address. The sequencer can then examine one of these 
fields and jump to the location defined by the state of 
that field. You can use this capability to advantage in a 
line-clipping algorithm. 

In the 2-D mode, one of the most important pixel- 
processing operations is the movement of a rectangular 
block of pixels from one area of the frame buffer to 
another. This process, also known as BitBlt, may also 
require the execution of a logical operation during the 
transfer. The update processor transfers data one row 
at a time from the source block to the destination block. 














Within a row, the processor may transfer data either 
left to right or right to left. The sole reason for 
including the feature that provides fast access to eight 
pixels in the frame buffer is to speed block transfer. In 
the 32-bit/pixel mode, the algorithm that transfers one 
row of the source block to the corresponding row in the 
destination block has four steps, as illustrated in Fig 5a 
and described as follows: 

@ Read memory with X=24. This operation trans- 
fers pixels 24 through 31 into the frame buffer’s read 
registers. Next, read pixels 31 and 32 into the register 
file. Then read memory again with X=382. Read five 
pixels (32 through 86) into the register buffer. You have 
now transferred the first seven pixels from the source 
region into the register file (there are only seven valid 
pixels in the first destination read cycle). 

@ Read memory with X=96. This operation trans- 
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Fig 4—This update processor, which handles all geometry- and pixel-processing operations, uses a microprogrammed sequencer for control 


and parallel floating-point processors for vector operations. | 
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fers seven valid destination pixels into the frame buf- 
fer’s registers. 

@ Read each valid destination pixel, one at a time, 
and perform any required logical operation with the 
corresponding source pixel in the register file. Write 
the resulting pixel back into the frame buffer’s write 
registers. Copy each unread destination pixel from the 
input register to the output register. 

@ Write the eight destination pixels in the output 
registers back to memory. Repeat the sequence until 
you have transferred the entire row. 

Assuming that a memory-read cycle takes 300 nsec 
and that each frame-buffer read or write operation 
takes 100 nsec, the total transfer time is 500 nsec/pixel. 
Using this algorithm, an average covering all possible 
alignments of source and destination turns out to be 
approximately 600 nsec/pixel. This time is a substantial 
improvement over the time of 1200 nsec/pixel for the 
case in which each memory cycle accesses a single pixel, 
and it’s an acceptable data-transfer speed for 32-bit 
pixels. 

In the 8-bit/pixe] mode, the block-transfer algorithm 
must take into account different alignments of the 
source and destination within a 32-bit word, and it 
requires a modification of the procedure. The modified 
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algorithm, illustrated in Fig 5b, is as follows: 

@ Read source words 1 and 2 simultaneously from 
both output ports of the register file. Using the 
Am29332 funnel shifter, extract four bytes aligned with 
the destination, and write this 32-bit word back to a 
temporary location in the register file. In the example 
shown, you need to extract the last three pixels of word 
1 and pixel S2 from word 2. 

@ Read this aligned source location, using one regis- 
ter-file port. Read the destination pixel from the frame 
buffer via the main bus into the second register-file 
port. 

@ Perform the logical operation on the aligned- 
source and destination pixels, using the mask generated 
internally by the ALU; doing so leaves the first pixel 
unchanged by the logical operation. Write the result, 
which appears at the ALU’s outputs, back to the frame 
buffer’s input registers at the end of the cycle. 

Step 3 of the algorithm now takes three microcycles 
per word instead of two, and it changes the average 
transfer time to just over 600 nsec per word. Because 
each word contains four pixels, the average pixel- 
transfer time is 600+4=150 nsec/pixel. This pixel-trans- 
fer rate allows an entire 1kx1k-pixel screen to be 
updated in 150 msec, or about 10 frame times, and is 


ONE SOURCE ROW (X. 30 TO X=50) 


48 50 


t 

! 

| CORRESPONDING DESTINATION ROW (X=97 TO X 117) 
! 
| 


117 120 


SOURCE ROW STARTING AT X=30 


DESTINATION ROW STARTING AT X=97 


Fig 5—Pixel block-transfers need careful alignment of the source and destination within a group of pixels. In 32-bit/pixel mode (a), the 
group is eight pixels wide. In 8-bit/pixel mode (b), the group is four pixels wide. 
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The update processor needs fast access to 
several pixels at a time.in the frame 


buffer. 


sufficient for displaying text and manipulating 
windows. : 

It’s not difficult to implement line- and circle-drawing 
algorithms, such as those of Bresenham, in microcode. 
The inner loop of Bresenham’s line-drawing algorithm 
will require three microcycles. Because this time is 
equal to the time needed to access a pixel in the frame 
buffer, you can plot pixels at the pixel-access speed of 
the memory. However, because this algorithm does not 
profit from the fast access to sequential pixels, the 
plotting speed will be about the same in both the 
32-bit/pixel and the 8-bit/pixel modes. The inner loop of 
Bresenham’s circle-drawing algorithm will require four 
microcycles, and because each iteration through the 
loop generates eight points that must be plotted in 
separate memory cycles, circles too are drawn at the 





INSTR QO 


CONTROL © 


rate of about one pixel in every frame-buffer access 
time. 

Typical pixel- and geometry-processing operations in 
a 3-D system are computation-intensive and require 
that you carefully consider the design of the arithmetic 
unit. Integer arithmetic, although fast, is unsuitable for 
these graphics operations. Fixed-point arithmetic has 
disadvantages as well. Although you can readily per- 
form most pixel-processing functions using 32-bit fixed- 
point arithmetic, fixed-point geometry-processing op- 
erations require time-consuming pre- and postscaling 
operations. For this reason, floating-point operations 
are easier to develop and are more general in character. 
Furthermore, there are now many inexpensive floating- 
point chips, which are almost as fast as integer units 
and provide all the computation power you need. 


Fig 6—This SIMD floating-point unit has four sections that share a common control bus. All four sections concurrently perform the same 


operation on different data. 
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In a graphics system, most of the arithmetic compu- 
tations are vector operations, because points, plane- 
equations, transformation matrices, and other common 
data structures are all vectors. For example, you can 
represent a point in 3-D space, in homogeneous form, as 
the vector (x y z w). Although a single processor can 
perform vector operations sequentially, a multiple- 
processor system that uses four ICs (in this example, 
Am29325s) is much faster. If you can distribute the 
computation tasks among the four processors in such a 
way that you keep each processor busy all of the time, 
you can expect to achieve four times the performance of 
a single processor. 

Fortunately, it’s quite easy to distribute the simple 
vector operations that are useful in graphics. For exam- 
ple, perspective division on a point (x y w z) in homoge- 
neous coordinates yields (x/w y/w z/w 1). Consequently, 
you can perform these divisions in parallel on four 
different processors, and you can arrange for algo- 
rithms that do not map onto such an architecture to run 
(though more slowly) on a single processor as a se- 
quence of scalar operations. Furthermore, the fact that 
all processors perform the same operation (division, in 
this example) at the same time (but on different data) 
suggests that you should design the floating-point unit 
as a Single-instruction, multiple-data (SIMD) machine, 
whose processors share a common instruction bus. 

You can see the overall structure of a 4-processor 
SIMD floating-point unit in Fig 6. Each section consists 
of a floating-point processor, a register file, and a seed 
ROM (Fig 7). In each section, a 64-word area of the 
stack constitutes the register file, and you can address 
data in the register file with a 6-bit negative displace- 
ment from the stack pointer. The microcode word 
therefore contains four 6-bit fields to specify the ad- 
dresses of the four ports on the register file. The 
stack-addressing capability allows microcode subrou- 
tines to be completely general in character, and if you 
first load the stack pointer with zero, you can use the 
microcode-word displacement fields to specify absolute 
addresses. 

The seven instruction bits of the main microcode 
word, when decoded, provide al] the output-enable and 
multiplexer-select signals needed to reflect all possible 
arithmetic-operation and source/destination combina- 
tions. .Twenty-four bits specify the addresses for the 
four ports of the register file, two bits control write 
operations on the D, and Dy, ports of the register file, 
and one bit switches the source-select multiplexer lo- 
cated at the register file’s D4 input. Two additional bits 
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TABLE 1—TRANSFORMATION 
OF A 3-D POINT 


CYCLE EXECUTE READ/WRITE 
1 READ: Y,=R=ST(0), Ys =S=ST(4) 
READ: Ya=R=ST(1), Ys =S=ST(5) 
EXECUTE: R=R‘"S 
EXECUTE: F=F+R 
EXECUTE: R=R*S 


EXECUTE: £=F+R_ | READ: Ya=R=ST(3), Ys=S=ST(7) 
EXECUTE: R=R‘*S 
EXECUTE: F=F+R 


WRITE: Da=F, OUTPUT REGISTER=F 
(OPTIONAL) 


READ: Y,a=R=ST(2), Yg =S=ST(6) 


2 
3 
4 
5 
6 
7 
8 
9 


determine whether the stack pointer is to be left 
unchanged, incremented, decremented, or loaded from 
the data bus. 

A data-access microcycle consists of three time slots. 
In the first slot, the address hardware computes regis- 
ter-file addresses by adding the displacement specified 
in the microcode word to the current contents of the 
stack pointer. In the second slot, data is written into 
the register file. In the last slot, data required for the 
next execution cycle is read from the register file. 

The pipelined structure of the floating-point unit 
allows the overlapping of arithmetic operations with 
operations that access data from the register file. As a 
rule, the floating-point unit must access data from the 
register file one microcycle before using that data in an 
arithmetic operation. In many cases, however, the data 
needed for the next operation is already held in the 
Am29325’s internal registers, so that a register-access 
cycle is unnecessary. Furthermore, most graphics op- 
erations allow execution cycles to overlap data-access 
cycles in a similar manner. Consequently, the effective 
throughput of the floating-point unit remains close to 
one operation per microcycle. 


Guidelines for coding typical operations 


As an example of how you can distribute portions of 
an operation among the four processors, consider the 
transformation of a 3-D point in homogeneous coordi- 
nates, using a X4 matrix. The first step is to broadcast 
all four coordinates of the point to be transformed, and 
to write them into the register files of all four sections 
of the floating-point unit simultaneously. Because the 
register file also acts as the matrix stack, the transfor- 
mation matrix is already established in the floating- 
point unit. You then distribute the transformation 
matrix among the four sections, storing only one col- 
umn of the matrix in each section. 

Assume that the point to be transformed is on top of 
the stack at [ST(0) ST(1) ST(2) ST(3)], and that the 
matrix column is at [ST(4) ST(5) ST(6) ST(7)], where 
ST(n) refers to the data n words down from the current 
stack pointer. You perform the transformation by com- 
puting the dot product of the point and a column of the 
transformation matrix. You can now compute, in paral- 
lel, the four dot products needed to transform each 
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The update processor is configured with a 
single level of pipelining, so that next-ad- 
dress computation overlaps execution of the 
current microimstruction. 


component of the vector, one in each section of the 
floating-point unit. The entire transformation can com- 
plete within nine microcycles (Table 1). 

You can use the same approach to perform matrix- 
matrix multiplication. In this case, assume that the 
current transformation is on top of the stack, with one 
column in each section. You can now treat a row of the 
new matrix as a point and transform it by the matrix 
held on top of the stack to yield a row of the trans- 


ONE OF FOUR IDENTICAL SECTIONS 
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SEED AND 
CONSTANT ROM 


DATA 










Ce get eg en ag Oa meee Creche A Cee ete yt ae 







OUTPUT LATCH 


formed matrix. You repeat this procedure four times 
(once for each row) to obtain the complete result. A 
matrix-matrix multiplication therefore takes 36 micro- 
cycles. 

You can also perform parallel interpolation, using 
forward differences, when drawing cubic curves such as 
splines and Bezier curves. In this case, each iteration 
requires three addition operations, and because each 
component of the vector requires an identical computa- 
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Fig 7—Each section of the SIMD floating-point unit is identical with the others, and each has its own register file, seed and constant table, 


and floating-point processor. 
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tion, you can perform the four computations in parallel 
in the four sections. Consequently, you can compute a 
new point every four microcycles. In the computation 
shown below, Dx, D»x, and D,y are the first-, second-, 
and third-order forward differences for the X coordi- 
nate: 


[X Dx Dex DyyJ=[X Dy Dey Dyx]+[Dx D2x D3x 0] 
([Y Dy Dey DsyJ=[Y Dy Dey Dzy]+[Dy Dy Day 0] 
[X Dz De DyzJ=[Z Dz De, DyzJ]+[Dz Dez Dyz 0] 


Perspective division requires a division operation, 
and the normalization of an interpolated vector, in the 
inner loop of Phong shading, requires square-root oper- 
ations. The Am29825 does not perform division and 
square roots directly, however. Instead, it uses New- 
ton-Raphson iteration to obtain the corresponding re- 
sults. The seed ROM provides the seed (or first approxi- 
mation) to start the iteration procedure. Each iteration 
requires three microcycles for division and five micro- 
eycles for square roots. Refining the seed to approxi- 
mately single-precision accuracy requires another three 
microcycles. Consequently, each division operation re- 
quires a total of ten microcycles, and each square-root 
operation requires sixteen microcycles. Furthermore, 
because each processor in the floating-point unit has its 
own seed table, four such computations can proceed in 
parallel. EDN 
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Fast systems gain 
from a cascadable 
device supporting 
everything from 


| DESIGN APPLICATIONS | 


Variable-width FIFO buffer 
sequences large data words 


Tim Olson 


Advanced Micro Devices Inc., 901 Thompson PI., P.O. Box 3453, Sunnyvale, CA 94088; (408) 732-2400. 


First-in, first-out (FIFO) buffers are a popular 
means of matching different data rates in large digi- 
tal systems. I/O controllers for character-oriented 
devices like terminals, for example, usually return 
or receive one 8-bit byte on a slow but regular basis. 
In contrast, block-oriented devices, such as high- 
speed disks, must move large chunks of data from 
peripherals to the host bus with great speed. 

The demand for larger, denser data-processing 
systems has spurred the development of FIFO buff- 
ers with deeper memory 
but unchanged width. 
Cascading these buffers 
horizontally or vertical- 
ly is still the most com- 
mon and cost efficient 
method of expanding 


ee both the width and 
ines to periphera 
host adapters. depth of a data queue. 


Even this solution 
has shortcomings. 
FIFO buffers usually 
link devices of like width but do not possess the req- 
uisite logic to cope with, say, transferring data be- 
tween a 32-bit-wide memory, 16- or 32-bit data bus- 
es, and an 8-bit peripheral bus. To further 
complicate matters, some of the newer variable- 
width instruction architectures must buffer in- 
struction words varying in width from 8 to 128 bits 
at any particular cycle. 

Inshort, as both synchronous and asynchronous 
systems push toward larger or disparate data 
widths, it becomes more difficult to cascade with 
typical 8- and 9-bit-wide FIFO buffers in a rudi- 
mentary fashion. Designers are seeking an efficient 
solution for matching data widths as well as data 
rates. 

One of the best devices for such matching is the 
Am29338 Byte Queue FIFO buffer. The general- 
purpose, 32-bit-wide buffer is organized as four 
dual-ported RAMs, each 9 bits (1 byte plus parity) 
wide and 32 bytes.deep (Fig. la). Each RAM sec- 
tion ha’ its own queue (load) and dequeue (unload) 


“Reprinted with permission from Electronic Design, 


Vol. 35 No. 14, July 11, 1987. Copyright 1987 


Hayden Publishing Co., Inc.” 


pointers (Fig. 1b) and supplies byte-wise (that ts, 
byte-by-byte) parity checking at the buffer’s input 
and output. A Byte Count output shows the current 
number of bytes in the queue. The RAMs are orga- 
nized so that a variable number of bytes can be 
queued or dequeued at any cycle. The device can 
queue or dequeue from zero to four 8-bit tyes of 
data in one 80-ns cycle. Ultimately, this feature can 
be used to queue data at one width and dequeue it at 
another. For example, two 16-bit half words may be 
queued sequentially and dequeued as one 32-bit 
word. In addition, the Am29338 can be cascaded 
horizontally to release up to 16 data bytes (128 bits) 
per cycle. 

The Am29338 also addresses the problem of byte 
ordering, a side effect of the evolution of memory 
word widths form 8 to 16 to 32 bits. Byte ordering is 
simply the order in which bytes appear in a word. 
The Am29338 performs byte swapping to effect 
any type of byte-ordering scheme. Two signals, for 
example, allow bytes to be swapped within 16-bit 
half words and 32-bit half words, respectively. To- 
gether, they make possible four separate byte order- 
ings (Fig. 2). 

Like the rest of the Am29300 family of 32-bit mi- 
croprogrammable building blocks, the Am29338 is 
implemented in ECL (packaged in a 120-pin pin- 
grid-array) but is interfaced with TTL-level de- 
vices. Because it is RAM-based, the buffer has an 
almost zero fall-through delay, suiting it to appli- 
caitons where data must be immediately available 
after a queueing operation. 

This feature best suit systems with variable data 
widths, especially instruction-prefetching pipe- 
lines, I/O peripheral buffers, and hardware 
mailboxes. 


AN INSTRUCTION-PREFETCH QUEUE 


Instruction-prefetch queues, of course, separate 
instruction fetching from instruction execution for 
parallel execution of the two tasks. Between jumps 
from one operation to the other, a sequential in- 
struction stream is fetched from memory and 
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4. The Am29338 Byte Queue from AMD is a general- 
purpose, 32-bit FIFO buffer with four 8-by-32-bit RAM 
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placed in the prefetch queue. This occurs independently 
of the rate at which the instructions are decoded and exe- 
cuted. Because many computer architectures work with 
variable-length instructions, the Am29338, which can re- 
lease data of different widths, greatly simplifies prefetch- 
queue designs. Fixed-width words can be queued from 
memory while variable-length instructions are dequeued. 

The Am29338 buffer can function as an instruction- 
prefetch queue, where it is synchronized with a separate 
instruction-fetch unit (Fig. 3). In operation, sequential 
32-bit memory locations are fetched by the instruction- 
fetch unit and are stacked in the byte queue. Each time 
the CPU needs an instruction, it takes the next bytes in 
the byte queue rather than addressing main memory. The 
CPU can determine the instruction length from the first 
byte of the instruction and updates the dequeue pointer in 
the byte queue; that is, it tells the byte queue which bytes 
it wants to see. The instruction length is determined by 
the 4-bit word on the Bytes Dequeued (BDQ) lines while 
the Dequeue Clock (DQCLK) line releases the bytes 
from the queue. Ifa jump in the instruction sequence (the 
program) occurs, the instruction-fetch unit must flush 
the byte queue by asserting the Reset line and issuing a 
new instruction address. : 


EXECUTING SMALL LOOPS 


The Byte Count (CNT) indicator can serve as a tool to 
limit the buffer’s depth. For instance, jump or branch in- 
structions usually account for about 20% of a typical in- 
struction mix. When ajump occurs, instructions stored in 
the instruction-prefetch queue are discarded. To limit in- 
struction-prefetching operations and conserve memory 
bandwidth, the user can sound an alarm when the fetch 
buffer’s depth surpasses five or six instructions. 

Many operations, however, can be executed with small 
loops, which fit entirely in the prefetch queue and can be 
controlled with the assertion of the retransmit lines 
(RXMIT) and with a small amount of external hardware. 
The Am29338 buffer can rapidly retransmit stored block 
data without requeuing from main memory, assuming 
that 128 bytes or less have been queued since the last as- 
sertion of a Reset command. This is done by first bringing 
the RXMIT line low. When this happens, the chip’s inter- 
nal dequeue pointers are directed to the first RAM loca- 
tion, and the internal queue pointers are not reset. The 
data in the locations between the old queue pointers and 
the new dequeue pointers can then be unloaded. RXMIT 
is useful for redundant instruction sequences because the 
CPU can run faster without having to refetch instructions 
from memory or cache. | 

New applications open the door for instructions far in 


excess of 32 bits, particularly in systems that use large, - 


variable-length instructions spanning many bytes. To 
meet this challenge in the synchronous mode, up to four 
Am29338s may be cascaded horizontally to free up to 16 


consecutive bytes (one 128-bit word) for dequeueing in 
one cycle (Fig. 4a). Because each cascaded part is con- 
nected to a common 32-bit input bus, each chip holds the 
same information (Fig. 4b). When the Reset (or RXMIT) 
line is asserted, however, the internal dequeue pointers 
are offset by the value programmed on the chip’s position 
inputs, POS. 

Another frequent task for first-in, first-out buffers is as 
a straightforward I/O buffer. Many processor-memory 
systems have expanded their word length from 8 to 32 
bits, though the peripheral-controller chips have for the 
most part remained at 8 bits. The Am29338 buffer sup- 
plies a buffered path between peripherals and memory 
while making the necessary conversion from one word 
size to another. 


MESSAGE IN THE MAIL 


A communication mailbox usually serves to link two 
or more loosely coupled devices in a multiprogramming 
system. With the help of a first-in, first-out buffer, mes- 
sages from one device to another are queued in the mail- 
box. If the mailbox happens to be full, the sending process 
blocks data transfer until the mailbox has a slot free. Ifthe 
mailbox is empty, the receiving process is blocked until 
the mailbox receives a message from the sending end. 
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.3. The FIFO buffer can function as an instruction-pre- 


fetch queue by coupling it with a separate instruc- 
tion-fetch unit. The CPU runs faster by reading repeti- 
tive instruction loops from the byte queue without 
addressing main memory. 
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DESIGN APPLICATION @ Variabie-width FIFO buffer 


Otherwise, the sending and receiving processes run 
concurrently. 

When devices are run on separate processors in a mul- 
tiprocessor system, a hardware mailbox is needed. The 

»A4m29338 can help create such mailboxes (Fig. 5), serving 
to transfer variable-length messages from one processor 
to another. 

In this design example, two AmPALI6R4 program- 
mable-logic arrays serve as the interface to the Am29338, 
one each for the sending and receiving processors. The ar- 
rays serve as a conduit to examine the status of the FIFO 
buffer and also enable a programmable interrupt. In oper- 
ation, the processor wishing to send a message to the 
mailbox calls a special operating-system routine. This 
routine first reads the status of the mailbox; if it is not full, 
the message is written. Then the routine returns to the 
calling process. If the mailbox is full, the operating-sys- 
tem routine blocks the calling process and enables inter- 
rupts from the mailbox. When a slot becomes available, 
the sending processor is interrupted. The interrupt rou- 
tine sends the message, disables interrupts from the mail- 
box, and blocks the sending process. The receiving side of 


POS, POS, POS, POS, 
Am29338 Am29338 Am29338 Am29338 
POS, POS, POS, POS, 


Most significant Least significant 


(a) 


Q = Internal queue pointer DQ = Internal dequeue pointer 
—)P _— 


Byte queue 3 Byte queue 2 Byte queue 1 Byte queue 0 


4. Up to four FIFO buffers can be horizontally cas- 
caded to support large word-width computer appli- 
cations. Up to four devices can create one 128-bit 
word or a combination of 8-bit bytes (a). Buffers are 
combined by offsetting the internal queue and de- 
queue pointers. 
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the mailbox, of course, operates in an inverse manner. 

From the practical standpoint, the state of the mailbox 
is first examined by asserting the Chip Select (CS), Read/ 
Write (R/W) and Control/Data (C/D) lines of the ap- 
propriate PAL device and monitoring the buffer’s Full 
flag. An interrupt enable can then be written by bringing 
the R/W line low. The actual message may be transmit- 
ted from the processor to the mailbox by bringing the 
PAL’s CS and R/W lines low. 

Conversely, messages from the mailbox are sent to the 
receiving end by asserting CS and R/W of the appropri- 
ate PAL device, and bringing its C/D line low. The mail- 
box status 1s examined by asserting CS, R/W and C/D. 
The interrupt-enable bit can be written by bringing CS 
and C/D high, and R/W low. 

The mailbox, finally, can be extended to operate in a 
heterogeneous multiprocessing system. In that system, 
processes with both disparate data-block widths and 
clock frequencies are interconnected—an easy task for 
this FIFO buffer. 





SYNCHRONOUS OR ASYNCHRONOUS OPERATION 


The Am29338 operates as most FIFO buffers do in the 
asynchronous mode, as well as in the synchronous mode. 
For the asynchronous mode, the Queue Clock input 
(QCLK) and DQCLK lines serve as strobes to queue or 
dequeue data and are generally independent of one anoth- 
er. As a result, the buffer can connect two asynchronous 
subsystems or to an asynchronous bus such as the 
VMEbus. 

In a synchronous system, however, Enable signals are 
easier to generate than strobes. Thus, the QCLK and 
DQCLK signals may be simply derived from the com- 
mon subsystem clock. Queueing and dequeueing may 
then be ordered with the Queue Enable (QEN) and De- 
queue Enable (DQEN) inputs. This technique makes it 
easy to interface the buffer to a single subsystem or syn- 
chronous bus, such as Multibus II. 

As long as the FIFO buffer is neither full nor empty, 
the rates at which data flows in and out of the buffer are 
independent of each other. The user stays abreast of the 
chip buffers’ states by means of four status indicators: 
Full, Almost Full (A-Full), Empty, and Almost Empty 
(A-Empty). This is the role of the byte-count output. 

Besides the basic flags such as Full and Empty for indi- 
cating chip state, the Am29338 supplies indicators to 
warn of the exact condition of its buffers. The A-Full and 
A-Empty outputs, for example, show that there are less 
than 4 bytes of space available, or more than 4 bytes of 
data in the buffer. These indicators, like Full and Empty, 
are valid only for synchronous operation. 

Finer control over the amount of data stored 1s possible 
with the 7-bit Byte Count output, which monitors the 
number of bytes currently in the buffer. Unlike the other 
status indicators, Byte Count is valid only in the synchro- 
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DESIGN APPLICATION @ Variable-width FIFO buffer 


nous mode. In asynchronous operation, Byte Count is 
undefined. 

An example of applying the Byte Count indicator is il- 
lustrated by its use in control tasks. For instance, various 
system devices may need some minimum amount of data 
on hand before a given function can be carried out. In this 
particular case, an external comparator informs the sys- 
tem that the required information is indeed in the buffer. 

In all operations, the chip is first initialized by bringing 
the Reset line low. In tasks like instruction-prefetch 
queues, asserting Reset flushes the queue when a jump or 
branch instruction occurs. This action discards any pre- 
fetched instructions. 


DATA-BIT MECHANICS 


The number of bytes to be queued into the buffer is set 
by means of the Bytes Queued (BQ) inputs, and the corre- 
sponding data is presented to the data (D) and data parity 
(PD) inputs aligned to the least significant byte. When 
the QEN line is asserted, data will be entered on the fall- 
ing edge of the QCLK input. The device’s internal point- 
ers will then be updated on the low-to-high transition of 
the clock. 

The number of bytes to be dequeued is determined by 
the Bytes Dequeued (BDQ) input. If the Dequeue Enable 
line (DQEN) is brought low, the state of the byte queue is 
updated and data is off-loaded on the low-to-high transi- 
tion of the DQCLK signal. 

When the Output Enable line (OE) goes low, the next 
four bytes available for unloading and their correspond- 
ing parity bits are brought out on the data output (Y) and 
data parity (PY) lines. When OE moves high, the D and 
PY pins assume a high-impedance state. 
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As mentioned earlier, the chip relies on byte-wise pari- 
ty checking for error correction. Parity bits are checked 
at the input, stored with the data, and checked again at 
the output. Dual checking lends great flexibility to the er- 
ror-checking operation. In an task involving an instruc- 
tion-prefetch queue, for example, the designer may 
choose to check parity only at the output. Then, only exe- 
cuted instructions are checked. As a result, instructions 
that were prefetched but never used (such as those prefe- 
teched after a jump operation) will not cause spurious 
interrupts. 

In typical operation, the data input parity-error output 
(PDERR) will go high if any of the bytes being queued 
have a parity error. The output parity-error line 
(PYERR) goes high if any of the bytes on the output bus 
have a parity error. Only valid bytes are checked for data 
anomolies; bytes on the data-input bus which are not be- 
ing queued or undefined bytes which are sent out when 
the byte queue is almost empty are not included in the 
checking for errors. 


Tim Olson, a senior planning engineer at Advanced Micro 
Devices, is in charge of developing microprocessor architec- 
tures and Am29300 family building blocks. Olson has a 
BSEE-computer science degree from the University of Col- 
orado at Boulderand an MSEE from the University of Ari- 
zona at Tucson. 
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5. Circuitry for a simple hardware mailbox needs only one Am29338 FIFO buffer and two programmabie-log- 
ic arrays for links to transmit and receive controllers. Three signal lines collectively check chip status and 
control information flow: CS, R/W, and C/D. A fourth line (IREQ) indicates interrupt requests. 
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6.10 DIGITAL SYSTEMS VME 29300-1 


Digital Systems offers the VME-29300-1, an Am29300- 
Family-based CPU, designed for those applications 
requiring the high performance cf a 32-bit processor. 
Intended for use in emulating other computers or special- 
purpose computing such as graphics, encoding/decod- 
ing, and data reduction, the processor can be supplied 
with or without firmware. Its key features are: 

¢ 100 ns per micro-instruction 

° 4K words of Writable-Control-Storage 

* 88-bit-wide microcode loaded from 27512 

EPROM. 


¢ On-board firmware address lights (single- 
stepping provided) 


e N-way branching up to 64 ways 

° 64 registers, 32 bits, 3-ported 
Calculated register address to 16-way 
Handles all seven interrupt levels 


e Under firmware control: A16/A24/A32 and D8/ 
D16/D32 


Introduction 


The VME-29300-1 CPU comes in a double-high two- 
board set. Both boards have P1 and P2 connectors for 
backplane connections, and in addition, control lines are 
interconnected between boards using two ribbon cables. 
The Instruction Board contains the Am29331 Se- 
quencer, address read-out, microprogram memory, 
pipeline registers, and writable-control-storage circuitry. 


The Arithmetic Board contains the Am29332 ALU, the 
Am29334 Register File, the calculation registers and 
latches, the constants ROM, andthe address and data I/ 
O circuitry. Board positions and spacing within the VME 
rack can be customized. 


Am29331—Microprogram Sequencer 


The Am29331 chip is configured as a 12-bit micropro- 
gram sequencer. The sequencer has multiway branch 
instructions that allow 1-of-N consecutive addresses to 
be selected as the branch target in a single cycle. The N- 
way branching can be chosen as 4-way, 8-way, 16-way, 
or 64-way by the microcode. Combinations of M, A, and 
D input lines of the Am29331 are used for this choice. A 
stack within the sequencer stores return addresses, loop 
addresses, and loop counts. It has 33 levels to permit the 
deep nesting of subroutines and loops. The lower 12 
output lines address the 4096-word microprogram 
memory, each word of which has a width of 88 bits. (The 
upper 4 address bits are not used.) Output data from the 
memory are fed to the pipeline registers. 





Writable-Control-Storage 


The Writable-Control-Storage (WCS) circuitry consists 
of a27512 EPROM andthe associated circuitry to contro! 
loading. At power-on time, the loader brings the micro- 
program into the 4Kx88 random-access memory, step- 
ping the Am29331 sequencer through a series of ad- 
dresses. Then each word of the microprogram is 
checked back against the EPROM bit pattern. When this 
task is complete, the WCS loader is disabled and the 
sequencer takes control. For debugging purposes the 
microprogram can be single-stepped, and the WCS 
loader again controls the Am29331 sequencer. The 
address readout displays each address (in a readable 
fashion) during single-stepping. 


Am29334—Register File 


The two Am29334 chips serve as a 64x32 external 
register file for the ALU. Each of these is a high-speed, 
random-access memory configured with one write port 
(D) and two read ports (A,B). The D port is fed from the 
32-bit wide Y bus, while the A port feeds the MA bus and 
the B port feeds the CB bus. Control of write operations 
is done with the common write enable to each chip. This 
allows the lower-16 or upper-16 bits to be stored sepa- 
rately and gives the four different write options: 


¢ Write no data at all 

® Write only the lower 16 bits 

¢ Write only the upper 16 bits 

¢ Write all 32 bits simultaneously 


Read operations are controlled by a common output 
enable for reading all 32 bits to the A or B port. The A 
address bus originates in the writable control store 
(WCS) while the B and D address buses originate in the 
address calculation circuitry. By calculating the B and D 
addresses the CPU achieves a high degree of micropro- 
gram flexibility. 


Am29332—ALU 


The Arithmetic Logic Unit (ALU) processes 32-bit-wide 
data paths. This means that it allows one-, two-, three-, 
or four-byte data in arithmetic and logic operations as 
well as multiprecision arithmetic and multiple-bit shift 
operations. The data flow uses two input buses, MA and 
CB, and one output bus, Y. Operation on data of variable 
byte length, variable-length bit fieids, or even single bits 
is made possible by the internal mask generator. This 
circuit creates a 32-bit mask for each instruction while 
using no overhead time. The mask is used as an addi- 
tional operand in each instruction to allow operation on 
the selected data widths. Instructions that operate on 
variable-length bit fields require a mask that is a contigu- 
ous String of 1s for all selected bit positions and Os for all 
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Condition 
Code 
| —Am29331 
Sequencer 


4K x 88 Bits 
Microprogram 
Memory 


“Pipeline 
Registers 
VME Bus 

| Controls 


unselected bit positions. In cases where the field ex- 
ceeds the 32-bit boundary, the mask does not wrap 
around, allowing operation on a contiguous field across 
a word boundary. 


For most single-operand instructions, the unselected bit 
positions pass the corresponding bits of the operand 
unmodified. For most two-operand instructions, the 
unselected bit positions pass the corresponding bits of 
the operand unmodified on the CB input. Thus, for two- 
operand instructions the mask allows the merging of the 
two operands in a single cycle. In addition to being used 
internally, the mask can be sent out over the Y bus as a 
pattern for testing purposes. 


The Am29332 uses a funnel shifter with two 32-bit input 
ports and one 32-bit output port. This circuit can perform 
all of the operations of a barrel shifter (one N-bit input port 
and one N-bit output port) extended to two operands 
instead of one. Such a circuit is used to shift or rotate the 
operand up or down from 0 to 32 bits in a single cycle. 
This is very useful in operations such as the normaliza- 
tion of a mantissa for floating-point arithmetic or in 
applications where the packing and unpacking of data 
are frequent operations. In addition, it can extract a 32-bit 
contiguous field across the two operands, a function 
which is very useful in some graphics applications. Also, 
any of its operations can be followed by a logical opera- 
tion with both completed in a single cycle. 
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The Am29332 easily handles prioritization which is use- 
ful in controlling N-way branches, performing normaliza- 
tions, and in graphic operations such as polygon fills. The 
built-in priority encoder sends out a 5-bit binary weighted 
code that signifies the relative position of the most 
significant 1 of the byte width selected. This allows 
prioritization on either 8—, 16—, 24—, or 32-bit operands. 
The priority encoder output can be passed onto the Y bus 
or stored in the status register. 


The Complete VME-29300-1 


The VME-29300-1 is a complete 32-bit processor when 
firmware is in place. It will operate on the VMEbus as a 
master or an interrupt-handler. Since it is not a fixed- 
instruction-set processor, firmware must be designed for 
proper operation. However, this is its outstanding advan- 
tage over other processors. Firmware options are almost 
limitless, giving the processor its high degree of adapta- 
bility to virtually any computing job. Chief among the 
suitable applications of this CPU is it ability to emulate 
other computing systems. This capability is not limited to 
32-bit processors, of course. Eight-bit and 16-bit systems 
are also easily emulated. Other complex computing jobs 
are also possible such as reducing large amounts of data 
and executing graphics programs. 


Digital Systems will design the firmware and deliver it 
with your system or provide design advice at an hourly 
rate by phone call or site visit. 


12-bit Microprogram Sequencer 


¢ Provides 100-ns microcycle time to support 32-bit 
high performance system 


¢ Supports 4-way, sae 16-way, and 64-way 
branching chosen by the microcode 


¢ Contains built-in conditional test logic for use with 
the ALU status bits 


¢ A 33-level stack provides support for loops and 
subroutine nesting 


* Supports single-stepping for the purpose of 
debugging 


¢ 12-bit address readout provided 
Microprogram Memory 


¢ Provides 4096-word capacity with a word width of 
88 bits of writable-control-storage 


© A 27512 EPROM allows customized firmware to 
be easily replaced or modified 


Register File 


¢ Two cascaded high-speed RAM chips for 64x32- 
bit register capacity 


° Write control allows independent lower-16 or 
upper-16 bits of storage 


¢ Provides one WRITE port (D) and two READ 
ports (A, B) and four WRITE options 


¢ Calculated B and D addresses provide high 
degree of microprogram flexibility 
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ALU 


¢ Acombinatorial architecture with equal cycle time 
for all instructions, two input ports, and one 
output port 


¢ Funnel shifter allows N-bit shift-up, shift-down, 
32-bit barrel shift or 32-bit field extract 


¢ Supports one-, two-, three-, and four-byte data 
for all operations and variable length fields for 


logical operations 


VME Characteristics 


¢ Double-high, two-board set occupies 4 slots 


¢ Power requirements: +5 VDC @ 3 A (max), +12 
VDC @0A,-12 VDC @0A 


* Operating range: 0-70°C, 80% relative humidity, 
forced. cooling required 


¢ Interrupt handler options: 1-7 
¢ Requester option: R(3) used 


¢ Master data transfer options: A16/A24/A32 and 
D8/D16/D32 


Additional information is available upon request from: 
Digital Systems Corporation 

3 North Main Street 

Walkersville, MD 21793 

(301)845-4141 
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7.1 THE Am29300/29C300 TIMING 
ANALYSIS 


With the Am29300, you can construct a system with a 
family cycle time of 80 ns or faster. This is especially true 
with the Am29300A. This section discusses the various 
_ Critical paths in determining the fastest family cycle time. 
The following systems configuration was assumed: 


Control Path 


Am29331/29C331 16-bit Microprogram Sequencer 
Am29818A Pipeline Register 

Am99C68 Control Memory 

Am27S55A Registered PROM 

Data Path 

Am29332/29C332 32-Bit ALU 

Am29334/29C334 68 x 18 Dual Port Register File 
Am29818A Status Register 


Non-Pipelined Operation 


The block diagram surrounding the Am29300/29C300 
family is shown in Figure 7-1 and its critical timing 
analysis is described in Tables 7-1 and 7-2. This timing 
analysis shows that a system cycle time of 75 ns is 
possible with the Am29300/29300A family, and 90 ns is 
possible with the Am29C300/29C300-1 family. The 
summary of the performance is listed in Table 7-5. 


Pipelined Operation 


With the two pipelined stages in the Am29C334 
(PIPE=HIGH), you can construct the pipelined systems 
with the Am29C300. As an example for this operation, 
the following describes a double-pipelined system. Inthis 
example, the Am27S55A, the registered PROM is util- 


ized to improve the control path. Figure 7-2 shows an 


example of the pipelined system. 


Writing the Data into the Register File 


It takes two cycles to write data into the register file. Inthe 
first cycle, the data from the main memory is latched into 
the input pipeline register. Then inthe second cycle, the 
data is written into the RAM location in the Am29C334. 
(See cycle 1-2 in Table 7-3.) 


Data Calculation and Storage 


In the first cycle, data (A1) to be operated uponis latched 
fromthe RAM location onto the output pipeline register of 
the Am29C334. In the second cycle, the operation is 
performed on the data (A1,B1) by the Am29C332. The 
result (C1) is then set up on the input pipeline register of 
the Am29C334. In the last cycle, the result is written into 
the RAM location of the Am29C334. For an example, 
refer to cycle 3-6 of Table 7-3. 


The second of the path cycles is the most critical of the 
three. The maximum propagation delay incurred on this 
timing then has to be compared with the maximum 
control path timing. The cycle time is determined by the 
longest of the two. The speed and choice of the main 
memory has to be based on the cycle time. 


It is possible to time-share the above two operations. In 
other words, data can be written into the register file at 
the same time the operation is performed on the data 
from the register file. See Table 7-3 for an example. 


Table 7-4 shows the calculation of the pipelined 
Am29C300 system. As you notice, testing of the ALU 
status through the Am29C331 is critical for the control 
path, and the data path involving I-Y of the Am29C332 is 
also critical. The table shows that the data path deter- 
mines the cycle time. The result is shown in Table 7-5. 


It is quite possible to improve the cycle time further with 
combinations of the Am29300, Am29300A, Am29C300, 
and Am29C300-1. 
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Figure 7-1. Am29300/29C300 System Timing Analysis 
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Table 7-1. Bipolar Am29300 Timing Analysis 


Loop Device Path Am29300 Am29300A° 

1 Am27S55A' Pipeline Reg. CP-Q 10 10 
Am29331 Sequencer D-Y 19 17 
Am27S55A RPROM A-Q 29 29 

Total: 49 47 

2 Am27S55A Pipeline Reg. CP-Q 10 10 
Am29331 Sequencer I-Y 25 22 
Am27S55A RPROM A-Q 20 20 

Total: 55 52 

3 Am2981 8A? Status Register CP-Q 11 11 
Am29331 Sequencer T-Y 25 22 
Am27S55A RPROM A-Q 29 29 

Total: 56 53 

4 Am27S55A Pipeline Reg. CP-Q 10 10 
Am29332 ALU I-Y 47 40 

Am29334 Reg. File D-CP _9 _9 

Total: 66 59 

5 Am27S55A Pipeline Reg. CP-Q 10 10 
Am29332 ALU I-C,Z,N,L 48 41 
Am29818A Status Reg. Y-CP 6 _6 

Total: 64 57 

6 Am27S55A Pipeline Reg. CP-Q 10 10 
Am29334 Reg. File A-Y 24 24 

Am29332 ALU D-C,Z,N,L 43 37 
Am29818A Status Reg. D-CP _6 6 

Total: 83 77 

7 Am27S55A Pipeline Reg. CP-Q 10 10 
Am29334 Reg. File A-Y 24 24 

Am29332 ALU D-Y 35 30 

Am29334 Reg. File D-CP _9 _9 

Total: 78 73 








Note: 1. In this timing analysis, a registered PROM is used to store microcodes. WCS can be also implemented as 
replacement for the registered PROM. 


2. The specifications can be improved by choices of the pipeline registers. 
3. This is only applicable for the Am29331A and the Am29332A. 
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Loop 


Table 7-2. CMOS Am29C300 Timing Analysis (Non-pipelined Mode) 


Device 


Am29818A2 
Am29C331 
Am99C68! 
Am29818A 
Total: 


Am29818A 
Am29C331 
Am99C68 
Am29818A 
Total: 


Am29818A 
Am29C331 
Am99C68 
Am29818A 
Total: 


Am29818A 
Am29C332 
Am29C334 
Total: 


Am29818A 
Am29C332 
Am29818A 
Total: 


Am29818A 
Am29C334 
Am29C332 
Am29818A 
Total: 


Am29818A 
Am29C334 
Am29C332 
Am29C334 
Total: 
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. The specifications can be improved by choices of the pipeline register. 
3. An external register is used to store status output of the ALU. If the internal status register is used, the cycle 


time will be faster by eliminating the setup time of the external register. 
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WCS is used to store microcodes. The registered PROM can be utilized as a replacement for the WCS. 
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Table 7-3. Pipelined Timing Sequence (Data Path) 





Cycle 1 2 3 4 5 6 

Am29C334 I/P Al! A2 A3 A4 A5/C1 A6/C2 fF 
RAM (write) Al A2 A3 A4 A5/C1 : 
RAM (read) A1/B1? A2/B2 A3/B3 A4/B4 A5/B5 
o/P A1/B1 A2/B2 A3/B3 A4/B4 

Am29C332 ALU C1 C2 C3 


Legend: I/P = Input Pipeline Register 
O/P = Output Pipeline Register 
Ci = Ai op Bi (op = Am29C332 Operation) 


Note: 1. For example, A1/B1 stands for (data derived from A port)/(data derived from B port). 
2. Assumption is made that data Bi is already stored in the Am29C334. 
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Figure 7-2. Block Diagram 
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Table 7-4. Pipelined Cycle Time Calculation 


Am29C300 


Control Path Am29C300 Am29C300-1 Data Path Am29C300-1 
Am29818A CP-Q_. 11 11 Am29818A CP-Q 11 11 
Am29C331 =«iT-Y 24 22 Am29C332_ I-Y 66 47 
Am27S55A Add. Setup 20 20 Am29C334 D-CP 15 13 
Am29818A D-CP 6 _6 Total: 92 71 
Total: 61 59 

Table 7-5. Am29300/29C300 Family Cycle Time (ns) 
Am29300 Am29300A Am29C300 Am29C300-1 
Non-Pipelined 83 77 109 86 
Pipelined N/A - N/A 92 | 71 


7.2 THERMAL CHARACTERISTICS/ 
AIR FLOW 


DEFINITION OF THERMAL RESISTANCE 


The reliability of an integrated circuit is largely dependent on 
the maximum temperature which the device will attain during 
operation. Because the stability of a semiconductor junction 
declines with increasing temperature, knowledge of the ther- 
mal properties of the packaged device becomes an important 
factor during device design. In order to increase the operating 
lifetime of a given device, the junction temperatures must be 
minimized. This demands knowledge of the thermal resistance 
of the completed assembly and specification of the conditions 
in which the device will function properly. As devices become 
both smaller and more complex and the requirement for high 
speed operation becomes more important, heat dissipation 
will become an ever more critical parameter. 


Thermal resistance is defined as the temperature rise per unit 
power dissipation above some referenced condition. The unit 


of measure is typically °C/watt. The relationship between 
junction temperature and thermal resistance is given by: 


Ty = Tx + Pp yx (1) 


where: T; = junction temperature 
Ty =reference temperature 
Py = power dissipation 
6, = thermal resistance 
X =some defined test condition 


‘In general, one of three conditions is defined for measurement 


of thermal resistance: 


O46 — thermal resistance measured 

. with reference to the tempera- 
ture at some specified point on 
the package surface. 


O55 - thermal resistance measured 
(still air) with respect to the temperature 
of a specified volume of still air. 


Osa — thermal resistance measured 

(moving air) with respect to the temperature 
of air moving at a specified ve- 
locity. 


The relationship between 6), and Oy, is 


Bsn = O50 + Ion 
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where Oc, is a measure of the heat dissipation due to natural 
convection (still air) or forced convection (moving air) and the 
effect of heat radiation and mounting techniques. 6). is 
dependent solely on material properties and package geome- 
try; 6, includes the influence of the surface area of the 
package and environmental conditions. Each of these defini- 
tions of thermal resistance is an attempt to simulate some 
manner in which the package device may be used. 


The thermal resistance of .a packaged device, however 
measured, is a summation of the thermal resistances of the 
individual components of the assembly. These in turn are 
functions of the thermal conductivity of the component mate- 
rials and the geometry of the heat flow paths. Like other 
material properties, thermal conductivity is usually tempera- 
ture dependent. For alumina and silicon, two common pack- 
age materials, this dependence can amount to a 30% 
variation in thermal conductivity over the operating tempera- 
ture range of the device. The thermal resistance of a compo- 
nent is given by 


L 
ae eran (2) 
K(T)A 
where: L = length of the heat flow path 
A = cross sectional area of the heat flow path 
K(T) = thermal conductivity as a function of tem- 


perature 


and the overall thermal resistance of the assembly (discount- 
ing convective effects) will be: 


L, 
6=206,=2 
KA, 
but since the heat flow path through a component is influ- 
enced by the materials surrounding it, determination of L and 
A is not always straightforward. 





A second factor that affects the thermal resistance of a 
packaged device is the power dissipation level and, more 
particularly, the relationship between power level and die 
geometry, i.e., power distribution and power density. By 
rearrangement of equation 1 to 


, 

Py 6,, (Ty - Ty) x, 
the relationship between P, and T,can be more clearly seen. 
Thus, to dissipate a greater quantity of heat for a given 
geometry, T; must increase and, since the individual @, will 
also increase with temperature, the increase in T, will not be a 
linear function of increasing power levels. 


(Tp= Th) (3) 


A third factor of concern is the quality of the material 
interfaces. In terms of package construction, this relates 
specifically to the die attach bond, and for those packages 
having a heatsink, the heatsink attach bond. The quality of the 
die attach bond will most severely influence the package 
thermal resistance as this is the area which first impedes the 
transfer of heat out of the silicon die. Indeed, it seems likely 
that the initial thermal response of a powered device can be 
directly related to the quality of the die attach bond. 


EXPERIMENTAL METHOD 


The technique for measurement of thermal resistance involves 
the identification of a temperature-sensitive parameter on the 
device and monitoring this parameter while the device is 
powered. For bipolar integrated circuits the forward voltage of 
the substrate isolation diode provides a convenient parameter 
to measure and has the advantage of a linear dependence on 
temperature. MOS devices which do not have an accessible 
substrate diode present greater measurement difficulties and 
may require simulation through use of a specially designed 
thermal test die. Choice of the parameter to be measured 
must be made with some care to ensure that the results of the 
measurement are truly representative of the thermal state of 
the device being investigated. Thus measurement of the 
substrate isolation diode which is generally diffused across the. 
area of the die yields a weighted average of the condition of 
the individual junctions across the die surface. Measurement 
of a more local source would yield a less generalized result. 


For MOS devices, simulation is accomlished using the thermal 
test die. The basis for this test die is a 25 mil square cell 
containing an isolated diode and a 1 K{2 resistor. The resistors 
are interconnected from cell to cell on the wafer before it is cut 
into mulitple arrays of the basic unit cell. In use the device is 
powered via the resistors with voltage or current adjusted for 
the proper level and the voltage drop of the individual diodes is 
monitored as in the case of actual devices. 


Prior to the thermal resistance test, the diode voltage/ 

temperature calibration must be determined. This is done by 

measuring the forward voltage at 1 mA current level at two 

different temperatures. The diode calibration factor is then: 
To-T, AT 

K, = re (4) 

Vo-V, AV 

in units of °C/mV. For most diodes used for this test the 

voltage/temperature relationship is linear and these two 

measurement points are sufficient to determine the calibration. 





The actual thermal resistance measurement has two alternat- 
ing phases: measurement and power on. The device under 
test is pulse powered with an ON duty cycle of 99% anda 
repetition rate of < 100 Hz. During the brief OFF states the 
device is reverse-biased with a 1 mA current and the voltage 
drop is measured. The series of voltage readings are averaged 
over short periods and compared to the voltage reading 
obtained before the device was first powered ON. The thermal 
resistance is then computed as: 


K-(Ve-V)  K,AV 


ix 


(5) 


Valu Pp 
where: Ke = calibration factor 
V, = initial forward voltage value 


V- = current forward voltage value 
Vu = heating voltage 
ly = heating current 
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The pulsing measurement is continued until the device has 
reached thermal equilibrium and the final value measured is 
the equilibrium thermal resistance of the device under test. 


When the end result desired is @,, (still air), the device and the 
test fixture (typically a standard burn-in socket) are enclosed in 
a box containing approximately 1 cubic foot of air. For @j¢ 
measurements the device is attached to a large metal 


heatsink. This ensures that the reference point on the device 
surface is maintained at a constant temperature. The require- 
ments for measurement of 8,, (moving air) are rather more . 
complex and involve the use of a small wind tunnel with 
capability for monitoring air pressure, temperature and velocity ~ 
in the area immediately surrounding the device tested. Stan- 
dardization of this last test requires much careful attention. 


WAVEFORMS FOR PULSED THERMAL RESISTANCE TEST 


VOLTAGE 


CURRENT 
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Table 7-6. Am29300 Thermal Resistance (°C/W)' 


Am29334GC 









Am29332GC 





Am29331GC 





Am29325GC 





Q),, JuNCtion-to-Ambient, Still Air 
Q);4, 200 Linear Feet per Minute 
8;,, 600 Linear Feet per Minute 

Qjc, JuNCction-to-Case 2 





Notes: 1. The air flow should be measured at the vicinity of the heatsink. | 
2. This is the measured value based on a 144-pin PGA with heatsink 4 
attached. The value should not vary significantly over the family. 
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7.3 CMOS/BIPOLAR RELIABILITY 


Reliability Monitor 
Program 


AMD Specification 01-011 

The Reliability Monitor Program (RMP) is an extensive 
effort to measure the reliability of all process families at 
AMD on a regular basis. Typically 7,000 to 10,000 devices 
per month are tested in a variety of environmental stresses. 


The Reliability Monitor Program has two purposes: 


Improved Reliability Performance: Each reject found 
undergoes failure analysis. Results are used by AMD to 
identify and establish corrective actions to eliminate failure 
mechanisms. 


Generation of Reliability Data: Reliability results are 
utilized in many ways. Typical applications include assessing 
the benefits of burn-in, providing estimates of typical life- 
times, modeling field applications, and determining suita- 
bility of plastic and hermetic packaging in various 
temperature and humidity environments. This information 
is available to the customer. 


The stress tests employed are listed in Table 2: 


Table 2. Reliability Monitor Stress Conditions 


STRESS DURATION | SAMPLE CONDITIONS 
SIZE HERMETIC | PLASTIC 
Early 160 hours 300 125°C 125°C 
Life or 85°C 
Operating 1000 hours 120 150°C 125°C 
Life and 125°C | or 85°C 
Extended 2000 hours 120 150°C 125°C 
Operating and 125°C | or 85°C 
Life (Biannual) 
Temperature | 1000 cycles 50 —65°C —65°C 
Cycle to 150°C =|: to 150°C 
Biased 1000 hours 50 N/A 85°C & 
Temperature 85% RH 
and Humidity 5v alt bias 
Pressure 160 hours 50 N/A 121°C, 
Cooker 15 psig, 
no bias 


The results from the Reliability Monitor Program form the 
basis of the failure rate calculations presented in the 
appendix. | 


The Estimation of Field Reliability 


In this section, a modeling procedure is described for esti- 
mating reliability under field conditions, based on the 
lifetest data generated in the Reliability Monitor Program. 
The summaries of the lifetest results and the actual failure 
rate projections are contained in the appendix. 


A General 
Reliability Model 


In order to evaluate the reliability of the product in the 
field, a general reliability model is utilized. The modeling 
procedure is described by authors Paul A. Tobias and 
David C. Trindade in the text Applied Reliability (New 
York: Van Nostrand Reinhold, 1986, pp. 173-182). 


The failure probability F(t) may be viewed as the proba- 
bility that a random unit drawn from the population fails 
by time t. Thus, F(t) may be represented in terms of a 
cumulative distribution function (CDF) of the times to 
failure. 


To understand the general reliability model it is useful to 
think of failures in terms of the three D's: dead, defective, 
or deficient. The general model encompasses (1) the dis- 
covery of functionally dead test escapes, (2) the defective 
subpopulations, and (3) the typical competing failure 
modes of the main population, which are typically indica- 
tive of design, material, or process deficiencies. 


The complete model for the field use CDF may be rep- 
resented as: 


FT = oFe + BF + (1-a-B)FN, 


where Fg is the discovery distribution for the proportion o 
of test escapes, Fy is the life distribution for the proportion 
B of units in the defective subpopulations, and FN is the 
life distribution derived from the N typical competing fail- 
ures modes. 


For Fn, the competing nature arises because a unit is 
viewed as a series system of different potential failure 
mechanisms such that the occurrence of any one failure 
mechanism results in failure of the unit. Thus, Fy =1— 
Ri R2R3...RN, where R; is the reliability function for a spe- 
cific failure mechanism. For the series model, failure rates 
at any point in time are additive. 


The distribution for the test escapes is not an actual life dis- 
tribution, but describes the application dependent rate at 
which the escapes may be discovered in use. This category 
also includes good units damaged in test or handling. 
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Failure Distributions 
The lognormal and Weibull CDF's are the distributions 


most often used to represent reliability failure mechanisms. 


The exponential distribution, characterized by a constant 
failure rate, is a special case of the Weibull. The lognormal 
distribution is specified by two parameters: T50, the 
median time to failure, and sigma, the shape parameter. 
Similarly, the Weibull distribution, which can be written in 
closed form as F(t) = 1— exp [—(t/c)™], is characterized by 
a characteristic life c and a shape parameter m. The value 
of the shape parameter determines whether the failure 
rate is increasing (m>1), decreasing (n<1), or constant 
(m=1). The exponential distribution, F(t) = 1— exp [—(t/c)], 
is specified completely by the one parameter c called the 
mean time to failure (MTTF). Figures below show failure 
rates for several values of the scale parameters of the log- 
normal and Weibull distributions, respectively. 


Lognormal Failure Rate (Hazard) 


(T50 = 1) 
HAZARD 
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Weibull Failure Rate (Hazard) 
(Characteristic Life = 1) 
HAZARD 





TIME 


For the general reliability model to be applied, the distri- 
butions and associated parameters must be determined, 
either through reliability studies or a review of the relia- 
bility literature. In addition, if the experimentation is 
performed under accelerated conditions, acceleration 
models are needed to relate the results to field use. For 
distributions such as the lognormal or Weibull, accelera- 
tion factors are applied to the scale parameter (such as 
the median or characteristic life, respectively), in order to 
generate a new scale parameter from which failure rates 
at various field conditions may be estimated. Under true 
linear acceleration, the type of distribution and the shape 
parameter do not change between stress and field 
conditions. 


Calculation of 
Failure Rates 


To estimate field failure rates from reliability studies, many 
factors must be considered. One primary requirement is 
the identification of individual failure mechanisms in order 
to ascribe the failures to the proper categories used in the 
general reliability model. 


Considerations and Assumptions 
1. The fraction of test escapes and the underlying discov- 
ery distribution: 


The fraction of test escapes and contributions from dam- 
age occurring as a result of testing and handling proce- 
dures at the vendor or customer are estimable only from 
actual field usage, since the underlying discovery distribu- 
tion is application dependent. To model these test escapes, 
a Weibull distribution with a decreasing failure rate may 
be used. In the appendix, test escapes, which represent an 
unknown early adder to the model, are assumed negligi- 
ble. Temperature acceleration considerations do not apply 
to test escapes since the units are basically inoperative. 


2. The fraction of defective subpopulations and the under- 
lying distribution: 


The lifetimes for the fraction defective subpopulations ma 
be modeled by the exponential distribution. Reliability 
results from stress testing must be carefully analyzed in 
order to identify the true defect related failure modes. 
From such studies at AMD, the mean time to failure (MTTF) 
for the defective subpopulations has been found to be 
approximately 100 hours at 125°C. The fraction 8 of 
product with defects is computed from the CDF estimate of 
defect related failures at readout time t by the following 
equation: 


g = CDF/ (1 — e71/100), 
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To combine the results from lifetests at different tempera- 
tures or from dissimilar readout times, a pooled estimate 
of B may be calculated as the weighted mean of the indi- 
vidual B estimates. Sample size is the weighting factor. 
Based on the reliability literature, an activation energy of 
0.45 eV has been chosen as representative. 


3. The distributions of the competing failure mechanisms in 
the main population: 


Competing failure mechanisms may occur during either 
early fail or long term lifetesting. The distribution of life- 
times is modeled by a lognormal distribution with a sigma 
specific to each failure mechanism. The sigma value may 
be determined from the reliability literature and checked 
for reasonableness against values estimated from the 
data. Also from the reliability data giving the fraction 
failed for various mechanisms at stress readouts, the 
median time to fail (T59) at stress conditions may be 
estimated. To combine the results for a specific mechanism 
from several lifetests, a pooled median time to fail, 
weighted by sample size, is computed from the individual 
In T50 estimates. 


The acceleration factors specific to a failure mechanism 
may be applied to the pooled stress T59 to estimate the 
field Ts. This field median life estimate may then be used 
with the same sigma to estimate the expected CDF in the 
field for a given mechanism at a chosen time. The individ- 
ual failure rates for each mechanism may be summed to 
arrive at the total device failure rate. 


4. The treatment of zero rejects for a possible failure 
mechanism: _ 


Just because failures for a given mechanism are not 
observed does not mean such mechanisms are non- 
existent. The sample size may be insufficient or the accel- 
eration may be inadequate to reveal all possible low level 
reliability concerns. In fact, if the potential failure mecha- 
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nisms have low thermal activation energies, the demon- 
stration of reliability performance may be limited by 
mechanisms with no observed failures! 


For example, time dependent dielectric breakdown 
(TDDB) for MOS devices has a lognormal distribution with 
sigma around 5.5 and activation energy of 0.3 eV. If no 
TDDB failures are observed in a HTOL stress, it is still pos- 
sible to calculate a non-zero, upper confidence level for 
the CDF based on the given sample size. The use of such 
a low activation energy may be a significant factor when 
combining failure rates across all possible mechanisms 
having higher activation energies. 


5. The incorporation of unknown failure mechanisms: 


Another significant factor in calculating failure rates is the 
manner in which unidentified mechanisms are incorpo- 
rated into the failure rate calculations. If the failure mech- 
anism is unknown, the rejects may be pooled into a 
category that uses fairly conservative activation energies 
of 0.3 eV for MOS and 0.5 eV for bipolar. Even though 
failure mechanisms are unidentified, it may still be possi- 
ble to estimate the lognormal sigmas from the data. 


6. Overall activation energies and the exponential 
distribution. 


In the reliability literature, it is common to see the use of 
overall activation energies, such as 0.7 eV for MOS and 
1.0 eV for bipolar technologies. In addition, the exponen- 
tial distribution is often assumed for all mechanisms. The 
use of an overall activation energy neglects those mech- 
anisms which are known to have lower activation energies 
and can result in estimates which are impressively low but 
may be misleading. Furthermore, the use of the exponen- 
tial distribution for all cases may also result in inaccurate 
projections, since it is well established in the literature that 
most failure rate mechanisms have non-constant failure 
rates. 
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CMOS | 


Channel Length: 1.5 um Gate Oxide Thickness: 250A Metal Pitch: 4-6 pm 


Product Types Tested: Static RAMs — Am99C68, Am99C88 





Non-Volatile Memory Division - Am27C 1024 





Microprocessor - Am29C10A 


Fixed Instruction Processor - Am82C288 


Data Summary and Failure Rate Estimation for General Reliability Model 


Test. Results Reliability Modeling 


Average Failure Rate (AFR) 


Package Term 


aaah Pa eGal Failure — 168 hrs 1000 hrs an Parameters FITs 55°C 
yP of Mode Mechanism ‘Fo5°G 125°C 150°C. «=((@V) @ 55°C 0-4khrs 4-30khrs 30-100khrs | 
Sample Size Fraction 
MTTF 
Hermetic 6,403 2,655 1,477 (hrs) EPH 
Defective Number of Rejects B 
Subpopulations Cause not found 1 0 0 0.45 1645 178 41 1 0 
Competing Sigma In(T50) 
Mechanisms = Corroded Metal 0 2 0 0.50 2.5 18 13 39 53 
Cracked Oxide 0 1 1 1.00 9.0 44 9 2 1 | 
lonic Contamination 14 0 0 1.00 9.0 45 6 1 1 | 
Charge Gain/Loss 0 0 1 0.80 9.0 44 11 2 1 
Oxide Pinholes 0 0 1 0.30 5.5 28 51 22 12 
Cause not found 0 1 5 0.30 5.5 27 94 _ 37 19 
0 Rejects 50% conf. _ 0 0 0 0.30 5.5 28 38 17 9 
Totals 2 4 8 262 120 96 | 
| 
| 
| 
Sample Size | 
Plastic 516 216 0 | 
Competing Number of Rejects Sigma _In(T50) , 
Mechanisms 0 Rejects 50% conf. 0 0 ; 0.30 5.5 24 516 159 73 i 
Totals 0) 0 ° 516 159 73 
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Instantaneous Failure Rate at Field Conditions. 
Curves Derived from General Reliability Model. 
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| ces ane eee ee ee Deas eet Ne : 
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7 : 
300 feocneees frosoo-=> bonae=- je-n2e--nfeneeoee- p-----=-- bo---==- fevsceeeefosessees 
rit Ae eee eee! eee Jpnsennenfunernnes | wae eee ee eee ewer : 
100 Se Rene emmmmeemmnene Ste, 
i eee ee eee a ae ieee eee | 
40 50 ~—s- 60 70 BO 90 100 
TIME (THOUSAND HOURS) 
TOTAL HERMETIC —— —PLASTIC 
Traditional Method for Reliablity Projection 
Single Exponential Distribution Assumed E a = 0.7eV 
Stress Junction Temperature to Field Junction Temperature 
Package Equivalent Failure Rate 
Type Stress Sample Device Hours Rejects (60% Confidence) 
Size at 55° C FITS 
Hermetic 168 hrs 125°C 6,403 83,841,423 2 
1000 hrs 125°C 2,655 206,933,299 4 
1000 hrs 150°C - 1,477 384 626,879 8 aaa eat 
Totals | 10,535 675,401,600 14 23 
Plastic 
168 hrs 125°C 516 6,756,548 0 
1000 hrs 125°C 216 16,835,251 0 
Totals 732 23,591,799 0 39 
Package Related Tests 
Stress Package Sample Failure Number of Percent 
Type Size Mechanism Rejects Rejected 
Temperature Hermetic 150 0 0.00 
Pressure Pot Plastic 50 0 0.00 
Totals 0 0.00 
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IMOXII 


Channel Length: N/A Gate Oxide Thiclness: N/A Metal Pitch: 4-7 pm 





Product Types Tested: Bipolar RAM- Am93422, Am93L412, Am93L422, Am93L425 


Field Programmable Logic- AmPAL16H8, AmPAL16HD8, AmPAL16L8, AmPAL16L8L, AmPAL16R4, 
AmPAL16R4L, AmPAL16R6, AmPALI6RE6EL, AmPAL16R8, AmMPAL16R8L, AmPAL22V10 





Bipolar Prom- Am27S25, Am27S29, Am27S31, Am27S33, Am27S181, Am27PS191, Am27S191 


Interface and Logic Products- Am29827, Am29828, Am29833, Am29841, Am29843, 
Am29844, Am29845,Am29853, Am29863, Am25LS14A 





Microprocessor- Am2901C, Am2910A, Am29705A 
Microcontroller- Am29116 


Peripheral Products- Am8177 


Data Summary and Failure Rate Estimation for General Reliability Model 


Test Results Reliability Modeling Average Failure Rate (AFR) 7 
Package Term Failure — 168 hrs 1000 hrs Ea Parameters FITs 55°C 
Type of Model Mechanism 495°C 425°C 150°C «=((@¥) @ 55°C 0-4khrs 4-30khrs 30-100khrs 
__Sample Size Fraction 
Heimetio 22,718 7,060 5,709 MTTF Defective | 
Defective Number of Rejects (hrs) (PPM) ! 
Subpopulations Damaged Metal 1 0 0 0.45 848 50 13 0 0 
Foreign Material Oxide 3 0 0 0.45 848 151 38 0 0 
Wire Heel Broken 1 ¢) (a) 0.45 848 50 13 0 0 
Cause not Found 2 0 0 0.45 848 101 25 0 0 
pompeung Sigmal Too) | 
Mechanisms — Crystal Defects 1 0 0 0.70 9.0 45 5 1 1 | 
Cracked Oxide 1 0 1 1.00 9.0 46 3 1 0 | 
0 Rejects 50% conf. 0 0) 0.50 4.0 24 8 8 6 
Totals 9 0 1 103 10 7 
Sample Size : 
Plastic Fraction 
18,338 6,580 0 MTTF Defective 
Defective Number of Rejects (hrs) (PPM) 
Subpopulations Glassivation Damaged 1 0 . 0.45 275 64 16 0 0 ! 
Damaged Metal 1 0 . 0.45 275 64 16 0 0 | 
Wire Clearance 1 0 . 0.45 275 64 16 0 0 | 
Cause not Found 3 0 . 0.45 275 191 48 0 0 | 
Competing Sigma ___!n(T50) : 
Mechanisms lonic Contamination 1 0 . 1.00 9.0 46 4 1 0 | 
0 Rejects 50% conf. 0 0) ° 0.50 4.0 23 44 34 25 | 
Totals 7 0 : 144 35 2° 
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Instantaneous Failure Rate at Field Conditions. 
Curves Derived from General Reliability Model. 


noa—n wn— 





TIME (THOUSAND HOURS) 
TOTAL HERMETIC —— —PLASTIC 


Traditional Method for Reliablity Projection 


Single Exponential Distribution Assumed E a = 1.0eV 
Stress Junction Temperature to Field Junction Temperature 








Package Equivalent Failure Rate 
Type Stress Sample Device Hours . Rejects (60% Confidence) 
Size at 55° C ‘_(FITS) 

Hermetic § — 168 hrs 125°C 22,718 931,618,604 9 

1000 hrs 125°C 7,060 1,724,695,417 0 

1000 hrs 150°C 5,709 6,530,194,964 1 

Totals 35,487 9,186,508,985 10 1 
Plastic 

168 hrs 125°C 18,338 368,584,446 . 7 

1000 hrs 125°C 6,580 740,419,943 0 

Totals 24,948 1,109,004,389 7 8 


Package Related Tests 


Stress Package Sample Failure Number of Percent 
Type Size Mechanism Rejects Rejected 
Temperture 
Cycle Hermetic 2,849 Lifted Metal 4 0.14 
: Cracked Oxide 3 0.11 
Package Seal Cracks 1 0.04 
Package Seal Voids 1 0.04 
Cause notfound 4 
Totals 13 0.46 
Plastic 2,603 Die Cracked 1 0.04 
Glassivation Cracked 2 0.08 
Corroded Metal 1 0.04 
Metal-Metal Short 1 0.04 
Cracked Oxide 4 0.15 
Water in Package 2 0.08 
Wire Neck Broken 1 0.04 
Intermetallics 5 0.19 
Temperature Totals 17 0.65 
Humidity Plastic 2,201 Cause not found 1 0.05 
Totals 1 0.05 | 
Pressure Pot Plastic 2,959 Die Cracked 1 0.03 
Corroded Leads 1 0.03 
Corroded Metal 1 0.03 
Totals 3 0.10 
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7.4 CMOS LATCH-UP TEST METHODS AND 
RESULTS 


Latch-up is a phenomenon that occurs when a parasitic 
PNPN structure on an IC chip is triggered and behaves 
like an SCR between the V,, and GND rails. Once 
initiated, the latch-up condition will persist until either the 
power supply is removed or the device is destroyed. In 
virtually all cases, the device is destroyed because of the 
large current that can flow from the V,, to the ground pin 
(the ON resistance of the SCR is very low). 


Interior modes of an IC could conceivably be prone to 
latch-up, but this intrinsically rare condition would be 
found during normal device testing and screening. Circuit 
nodes interfacing with the “outside world” are much more 
susceptible to latch-up because unusual transient condi- 
tions may occur — in particular, overshoot or ringing that 
pull the pin above the supply voltage or below GND. 


To induce latch-up, the conditions on these pins must 
meet two Criteria: a) there must be sufficient voltage to 
forward bias-critical junctions in the SCR, and b) the 
available current must be in excess of the SCR trigger 
current. If these conditions exist, and if a suitable para- 
sitic PNPN structure is connected to that pin, latch-up will 
occur. 


Some thought must be given to the test values of voltage 
and current when determining susceptibility of a part to 
latch-up. Reasonable test values would seemto be those 
experienced in an actual system under worst-case 
conditions. 


Most AMD devices are designed to work with a nominal 
+5V supply. In such a system, voltage transients result- 


+5.5V 


ing from transmission line effects, etc., will not exceed 
+5V in magnitude. Therefore, testing at a +10V extreme 
(V.¢ Plus 5V transient) and a -5V extreme (GND minus 
5V transient) will simulate a worst-case system environ- 
ment. 


Current levels for latch-up testing are governed by the 
maximum current available from any device in the sys- 
tem. The maximum drive capability of any output pin is 
approximately 100 mA; adding some margin to this, the 
test value becomes 300 mA. Any currentderivedfromthe 
voltage transient magnitude divided by the transmission 
line impedance will be considerably less than this. 


Latch-Up Testing 


Testing was performed by forcing 300 mA into and out of 
each device pin, whether input or output, while monitor- 
ing |, for any indication of latch-up. The current sources 
were voltage-limited at +10V and -5V, per the discussion 
above. The test configurations are shown in Figures 7-4 
and 7-5. 


Normal outputs were set to the HIGH state when current 
was forced into the pin (positive current) and set to the 
LOW state when the current was pulled out of the device 
(negative). Outputs with three-state capability were addi- 
tionally tested in the high-impedance state. 


The test results are summarized in Table 7-7. For the test 
limits indicated, no latch-up was induced forany pin of 
any part of any device type tested. 


Note that there was no positive current flow into the input 
pins since the inputs remained high-impedance up to the 
+10V clamp level. 





Figure 7-4. 


Figure 7-5. 
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‘Table 7-7. CMOS Latch-Up Testing Summary 
(Am29C01, Am29C10A, Am29C101) 


Tested Pin Test Figure Max Il (mA) Max VI (V) Latch-Up 
Inputs 1 0 +10 No 
2 -18 -5 No 
Normal 1 +300 +6.5 No 
Outputs 2 -300 -1.4 No 
Three-State 1 +300 +6.6 No 
Outputs (active) 2 -300 -1.8 No 
Three-State 1 +300 +10 No 
Outputs(High-Z) 2 -300 -1.8 No 


7.5 TEST PHILOSOPHY AND METHODS 6. Capacitive Loading for AC Testing 


The following nine points describe AMD’s philosophy 
for high volume, high speed automatic testing. 


. Ensure that the part is adequately decoupled at the 


test head. Large changes in V,,, current as the device 
switches may Cause erroneous function failures due to 
Voc Changes. 


. Do not leave inputs floating during any tests, as they 
may start to oscillate at high frequency. 


. Do not attempt to perform threshold tests at high 
speed. Following an output transition, ground current 
may change by as much as 400 mA in 5-8 ns. 
Inductance in the ground cable may allow the ground 
pin at the device to rise by hundreds of millivolts 
momentarily. | 


. Use extreme care in defining point input levels for AC 
tests. Many inputs may be changed at once, so there 
willbe significant noise at the device pins and they may 
not actually reach V,, or V,., until the noise has settled. 
AMD recommends using V, <0 V and V,, 23.0 V for 
AC tests. 


. To simplify failure analysis, programs should be de- 
signed to perform DC, Function, and AC tests as three 
distinct groups of tests. 
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Automatic testers and their associated hardware have 
stray capacitance that varies from one type of tester to 
another but is generally around 50 pF. This, of course, 
makes it impossible to make direct measurements of 
parameters which call for smaller capacitive load than 
the associated stray capacitance. Typical examples of 
this are the so-called “float delays,” which measure the 
propagation delays into the high-impedance state and 
are usually specified at a load capacitance of 5.0 pF. 
Inthese cases, the test is performed at the higher load 
capacitance (typically 50 pF) and engineering correla- 
tions based on data taken with a bench setup are used 
to predict the result at the lower capacitance. 


Similarly, a product may be specified at more than one 
Capacitive load. Since the typical automatic tester is 
not capable of switching loads in mid-test, it is impos- 
sible to make measurements at both capacitances 
even though they may both be greater than the stray 
capacitance. In these cases, a measurement is made 
at one of the two capacitances. The result at the other 
Capacitance is predicted from engineering correla- 
tions based on data taken with a bench setup and the 
knowledge that certain DC measurements (I,,,, |,, for 
example) have already been taken and are within 
spec. In some cases, special DC tests are performed 
in order to facilitate this correlation. 


The noise associated with automatic testing (due to 
the long, inductive cables) and the high gain of the 
tested device when in the vicinity of the actual device 
threshold, frequently give rise to oscillations when 
testing high speed circuits. These oscillations are not 
indicative of a reject device, but instead of an over- 
taxed test system. To minimize this problem, thresh- 
olds are tested at least once for each input pin. There- 
after, “hard” high and low levels are used for other 
tests. Generally this means that function and AC 
testing are performed at “hard” input levels rather man 
at V, Max. and V,, Min. 


. AC Testing 


Occasionally, parameters are specified that cannot 
be measured directly on automatic testers because of 
tester limitations. Data input hold times often fall into 
this category. In these cases, the parameter in ques- 
tion is guaranteed by correlating these tests with other 
AC tests that have been performed. These correla- 
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7. Threshold Testing 


tions are arrived at by the cognizant engineer by using 
precise bench measurements in conjunction with the 
knowledge that certain DC parameters have already 
been measured and are within spec. 


In some cases, certain AC tests are redundant, since 
they can be shown to be predicted by some other 
tests which have already been performed. In these 
cases, the redundant tests are not performed. 


. Output Short-Circuit Current Testing 


When performing |, tests on devices containing RAM 
orregisters, great care must be taken that undershoot 
caused by grounding the high-state output does not 
trigger parasitic elements which in turn cause the 
device to change state. In order to avoid this effect, it 
is common to make the measurement at a voltage 
(Vourput) that is slightly above ground. The V,, is 
raised by the same amount so that the result (as 
confirmed by Ohm’s law and precise bench testing) is 
identical to the V,,, = 0, V., = Max. case. 
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8.1 PHYSICAL DIMENSIONS* 


Plastic DIP (PD) 
PD4028 








o° 
18° 008 
aa 015 
< 
PID# 10124A 


.023 


Ceramic Sidebrazed DIP (SD) 
$D4028 





PID #07930C 


* For reference only. 
NOTE: Package dimensions are given in inches. To convert to millimeters, multiply by 25.4. 
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Plastic Leaded Chip Carrier (PC) 
PL 028 | 






485 450 


PID # 06751E 


NOTE: Package dimensions are given in inches. To convert to millimeters, multiply by 25.4. 
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Ceramic Pin-Grid-Array Packages (CG/CGX) ‘ 
CGX120 H 


BOTTOM VIEW j 


1.340 


.075 x 45° REF. 
(REFERENCE CORNER) 





© © © © © O-© 
OOOO © Ofn 
OOO © OO @]lo 


®©O@OOGO0O 


OOOOOOOOO| 
©OOOOOOOO 





O©OOOOO0O 
OO80O00 





.030 x 45° REF. q- :060 100 
(3 PLACES) .080 .200 


PID # 089008 105 Pas 


CG 120 


BOTTOM VIEW 








.075 x 45° REF. 
(REFERENCE CORNER) 


On OO ON fF WD 


HEATSINK 


,030 x 45° REF. <q. 2080 
(3 PLACES) 080 





PID # 07429C 
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Ceramic Pin-Grid-Array Packages (CG/CGX) (Continued) 
CGX145 


BOTTOM VIEW 









<4 .025 


.075 x 45° REF. .055 


(REFERENCE CORNER) 


.030 x 45° REF. 
(3 PLACES) 


PID # O9691A .195 


oom [ond 


NOTE: Package dimensions are given in inches. 
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Ceramic Pin-Grid-Array Packages (CG/CGX) (Continued) 









CG 145 
BOTTOM VIEW 
1.540 
075 x 45° REF. pene0 
(REFERENCE CORNER) ae oe 
ABC D NP 
1£6© © © ©O¢ 
2]0900®@ © © 
31 @@O®@ ®@®e 
4] ®© GOO @ © 
5| © © © © ®e 
6| © © © ® 
1.540 7}; © 00 © © 
rae ia @-O-@ |- 
10| © © © © © 
11 @@66 © © HEATSINK 
121606 © © © 
13] © © © © © 
141 © © © © © 
1 > © © © © 
.100 BSC 
.030 x 45° REF. . 
(3 PLACES) a “580 
.360 
+ 550 
PID #07321C 795 


NOTE: Package dimensions are given in inches. To convert to millimeters, multiply by 25.4. 
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Ceramic Pin-Grid-Array Packages (CG/CGX) (Continued) 





CGX169 
BOTTOM VIEW 
eee RE CL: 0) 025 
.075 x 45° REF. 1.780 + O55 
(REFERENCE CORNER) ' nner | 
A BOC 
{ © © ve 
21000 | 
3} OOO a 
41000 
51006 © .080 
6] © © 140 — 
71/000 
1.740 8 © © © 
1.780 sevecos es a 
© © © 
© © © 
© © © 
© © © 
© © © 
co 4s 
055 
> © © E— = 
| .100 BSC 
.030 x 45° REF. .060 
(3 PLACES) + 080 199 bee 
> 
105 
PID #073228 98 paas 


e dimensions are g 


iven in inches. To convert to millimeters, multiply by 25.4. 
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Ceramic Pin-Grid-Array Packages (CG/CGX) (Continued) 
CG 169 





BOTTOM VIEW 





1.740 
.075 x 45° REF. 1.780 
(REFERENCE CORNER) 


|}+—____— 1.600 ssc 


A BCODEF GH iJ 





HEATSINK 


.100 BSC 
.030 x 45° REF. 060 
(3 PLACES) + O80 _ 
nor 360 
Notes: 1. This dimension refers to heatsinks with only three fins. Heatsinks 420 
with more than three fins are as follows: 4 fins = .450/.510 405 (Note 1) 
6 fins = .540/.600 795 
PID #09017B 


7 fins = .690/.750 


NOTE: Package dimensions are given in inches. To convert to millimeters, multiply by 25.4. 
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8.2 ORDERING INFORMATION 


All Advanced Micro Devices’ products listed are stocked locally and distributed nationally by Franchised Distributors. 
See back of this book for the location nearest you. Please consult them for the latest price revisions. For direct factory 
orders, call your local AMD Sales Office or Sales Representative. See the back of this book for the location nearest 
YOU. | : 





Minimum Order 


The minimum direct factory order is $100.00 for a standard product. The minimum direct factory order for burn-in 
product is $250.00. 


Product Ordering, Package and Temperature Range Codes 


The following scheme is used to identify Advanced Micro Devices’ Standard products: 


Am29334 G Cc B 
Device Number _ Lo Optional 
Processing 
Package Type : Temperature 

| Range 
Package Type Temperature Range Optional Processing 
P = Plastic DIP C = Commercial Blank = Standard Processing 
D = Ceramic DIP (0 to +70°C) -B = Burn-in 


G= Pin Grid Array 
J = Plastic Leaded Chip Carrier 


The following scheme is used to identify Advanced Micro Devices’ Military (APL) products: 


Am29C334 /B Z Cc 
Device Number _ J LL Lead Finish 
Device Class Package Type 
Device Class Package Type Lead Finish 
/B = Class B X = DIP Packages C = Gold 
Z = All Other Configurations 
(PGAS, etc.) 
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